
- [True] Mostafa Saad
- [~]  How to detect and remove outliers using pandas
- [True]  What is Simple imputer --> a class in the sklearn. impute module that can be used to replace missing values in a dataset, using a variety of input strategies.

**Concepts of clean code**
1. **Meaningful Names**: Names should reveal the intent and purpose of the code.

2. **Small Functions and Methods**: Keep functions and methods small, ideally doing only one thing.

3. **Comments and Documentation**: Use comments sparingly and only when necessary. The code should be self-explanatory through well-chosen names and structure.

4. **Formatting and Consistency**: Follow a consistent code formatting style. This makes it easier for developers to read and work with the code.

5. **Avoid Duplication (DRY)**: Don't repeat yourself. Duplication in code can lead to maintenance challenges. Instead, encapsulate common logic in functions or classes for reusability.

6. **Single Responsibility Principle (SRP)**: Each class or module should have a single responsibility.

7. **Open/Closed Principle (OCP)**: Code should be open for extension but closed for modification. This principle encourages using interfaces, abstract classes, and polymorphism to add new functionality without changing existing code.

8. **Liskov Substitution Principle (LSP)**: Subtypes must be substitutable for their base types. In other words, derived classes should extend the behavior of their base classes without breaking functionality.

9. **Interface Segregation Principle (ISP)**: Clients should not be forced to depend on interfaces they do not use.

10. **Dependency Inversion Principle (DIP)**: High-level modules should not depend on low-level modules; both should depend on abstractions. This principle encourages using dependency injection and inversion of control to decouple components.

11. **Testing**: Clean code is testable code.

12. **YAGNI (You Ain't Gonna Need It)**: Avoid adding unnecessary features or code.

13. **Refactoring**: Continuously improve your code through refactoring.

14. **Code Review**: Encourage and participate in code reviews to ensure that the codebase adheres to clean code principles and that errors and issues are caught early.

15. **SOLID Principles**: Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion.

# **How to read all sheets of excel file using pandas**

In [8]:
import pandas as pd
df = pd.read_excel("/content/SuperStoreUS-2015.xlsx", sheet_name=None)
df

{'Orders':       Row ID Order Priority  Discount  Unit Price  Shipping Cost  Customer ID  \
 0      20847           High      0.01        2.84           0.93            3   
 1      20228  Not Specified      0.02      500.98          26.00            5   
 2      21776       Critical      0.06        9.48           7.29           11   
 3      24844         Medium      0.09       78.69          19.99           14   
 4      24846         Medium      0.08        3.28           2.31           14   
 ...      ...            ...       ...         ...            ...          ...   
 1947   19842           High      0.01       10.90           7.46         3397   
 1948   19843           High      0.10        7.99           5.03         3397   
 1949   26208  Not Specified      0.08       11.97           5.81         3399   
 1950   24911         Medium      0.10        9.38           4.93         3400   
 1951   25914           High      0.10      105.98          13.99         3403   
 
    

# **How to read random rows from Data using pandas**

In [12]:
df2 = pd.read_excel("/content/SuperStoreUS-2015.xlsx", sheet_name=0)
df2.sample(10)

Unnamed: 0,Row ID,Order Priority,Discount,Unit Price,Shipping Cost,Customer ID,Customer Name,Ship Mode,Customer Segment,Product Category,...,Region,State or Province,City,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID
1295,24952,Low,0.06,3.74,0.94,2334,Stephanie Hawkins,Regular Air,Home Office,Office Supplies,...,Central,Wisconsin,Greenfield,53220,2015-06-02,2015-06-09,-7.685,12,44.75,89610
1063,18159,Low,0.06,3.58,1.63,1933,William Crawford,Regular Air,Corporate,Office Supplies,...,Central,Texas,Garland,75043,2015-04-19,2015-04-23,14.0,10,34.76,86687
401,5142,High,0.1,9.71,9.45,699,Jenny Gold,Regular Air,Consumer,Office Supplies,...,West,California,Los Angeles,90041,2015-06-30,2015-07-03,-119.77,27,261.93,36647
230,23862,High,0.09,200.98,55.96,445,Judy Barrett,Delivery Truck,Small Business,Furniture,...,Central,Nebraska,Norfolk,68701,2015-06-23,2015-06-24,-512.872,9,1766.68,88084
496,5421,Low,0.02,1.14,0.7,894,Gail Rankin Cole,Regular Air,Corporate,Office Supplies,...,East,District of Columbia,Washington,20024,2015-02-02,2015-02-02,-0.49,38,44.85,38529
1234,24113,Critical,0.0,100.89,42.0,2225,Sean McKenna,Delivery Truck,Small Business,Furniture,...,West,New Mexico,Hobbs,88240,2015-02-21,2015-02-22,1500.12,15,1608.11,89970
703,20433,Medium,0.04,205.99,5.0,1237,Eva Simpson,Express Air,Corporate,Technology,...,Central,Texas,Carrollton,75007,2015-05-25,2015-05-26,13.9568,11,1878.24,86077
1095,21692,Not Specified,0.05,20.99,3.3,1979,Marianne Weiner Ennis,Regular Air,Corporate,Technology,...,West,Colorado,Littleton,80122,2015-05-05,2015-05-06,21.8834,4,72.75,87757
53,6243,Not Specified,0.04,160.98,30.0,94,Eddie House Mueller,Delivery Truck,Home Office,Furniture,...,Central,Illinois,Chicago,60601,2015-05-03,2015-05-05,116.1,37,6276.34,44231
1432,22840,Not Specified,0.02,178.47,19.99,2540,Helen Ferguson,Regular Air,Home Office,Office Supplies,...,South,Florida,Winter Springs,32708,2015-04-07,2015-04-08,106.9848,1,193.81,91017


# **How to save last 3 rows for 2 columns only**

In [21]:
l3rows = df2[["Order Priority","Shipping Cost"]].tail(3)
l3rows.to_excel("last 3 rows.xlsx")
l3rows

Unnamed: 0,Order Priority,Shipping Cost
1949,Not Specified,5.81
1950,Medium,4.93
1951,High,13.99


# **How to append to an existing csv file or excel file in same sheet**

In [26]:
writer = pd.ExcelWriter("/content/last 3 rows.xlsx", engine="openpyxl", mode = "a", if_sheet_exists="overlay")
l3rows.to_excel(writer)
writer.save()

  writer.save()


# **What are the sampling techniques?**

**Probability Sampling Techniques**

1. Simple Random Sampling
2. Systematic Sampling
3. Stratified Sampling
4. Cluster Sampling

**Non-Probability Sampling Techniques**

1. Convenience Sampling
2. Voluntary Response Sampling
3. Purposive Sampling
4. Snowball Sampling

# **How to drop rows with specific condition**

In [29]:
df2.info()
df2.drop(df2[(df2['unit price'] < 5)], inplace= True)
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1952 entries, 0 to 1951
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Row ID                1952 non-null   int64         
 1   Order Priority        1952 non-null   object        
 2   Discount              1952 non-null   float64       
 3   Unit Price            1952 non-null   float64       
 4   Shipping Cost         1952 non-null   float64       
 5   Customer ID           1952 non-null   int64         
 6   Customer Name         1952 non-null   object        
 7   Ship Mode             1952 non-null   object        
 8   Customer Segment      1952 non-null   object        
 9   Product Category      1952 non-null   object        
 10  Product Sub-Category  1952 non-null   object        
 11  Product Container     1952 non-null   object        
 12  Product Name          1952 non-null   object        
 13  Product Base Margi

KeyError: ignored

# **Replace data in specific column with specific values**

In [31]:
df2.loc[ df2["Ship Mode"] == "Regular Air", "Ship Mode"] = "slow air"
df2

Unnamed: 0,Row ID,Order Priority,Discount,Unit Price,Shipping Cost,Customer ID,Customer Name,Ship Mode,Customer Segment,Product Category,...,Region,State or Province,City,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID
0,20847,High,0.01,2.84,0.93,3,Bonnie Potter,Express Air,Corporate,Office Supplies,...,West,Washington,Anacortes,98221,2015-01-07,2015-01-08,4.5600,4,13.01,88522
1,20228,Not Specified,0.02,500.98,26.00,5,Ronnie Proctor,Delivery Truck,Home Office,Furniture,...,West,California,San Gabriel,91776,2015-06-13,2015-06-15,4390.3665,12,6362.85,90193
2,21776,Critical,0.06,9.48,7.29,11,Marcus Dunlap,slow air,Home Office,Furniture,...,East,New Jersey,Roselle,7203,2015-02-15,2015-02-17,-53.8096,22,211.15,90192
3,24844,Medium,0.09,78.69,19.99,14,Gwendolyn F Tyson,slow air,Small Business,Furniture,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-14,803.4705,16,1164.45,86838
4,24846,Medium,0.08,3.28,2.31,14,Gwendolyn F Tyson,slow air,Small Business,Office Supplies,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-13,-24.0300,7,22.23,86838
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1947,19842,High,0.01,10.90,7.46,3397,Andrea Shaw,slow air,Small Business,Office Supplies,...,Central,Illinois,Danville,61832,2015-03-11,2015-03-12,-116.7600,18,207.31,87536
1948,19843,High,0.10,7.99,5.03,3397,Andrea Shaw,slow air,Small Business,Technology,...,Central,Illinois,Danville,61832,2015-03-11,2015-03-12,-160.9520,22,143.12,87536
1949,26208,Not Specified,0.08,11.97,5.81,3399,Marvin Reid,slow air,Small Business,Office Supplies,...,Central,Illinois,Des Plaines,60016,2015-03-29,2015-03-31,-41.8700,5,59.98,87534
1950,24911,Medium,0.10,9.38,4.93,3400,Florence Gold,Express Air,Small Business,Furniture,...,East,West Virginia,Fairmont,26554,2015-04-04,2015-04-04,-24.7104,15,135.78,87537
