### Tasks

**Viewing and basic operations:**

- Print the first 5 lines of DataFrame df.
- Get a list of DataFrame columns.

**Data filtering:**

- Display all sales made in Moscow.
- Find all sales of product 'A'.

**Sorting data:**

- Sort the DataFrame by the 'Date' column in ascending order.
- Sort the DataFrame by the 'Quantity' column in descending order and print the first 3 rows.

**Grouping and aggregation:**

- Group the sales by the 'City' column and calculate the total number of units sold in each city.
- Group sales by the 'Product' column and calculate the total revenue for each product (Hint: total revenue can be calculated as Quantity * Unit Price).

**Processing of missing values:**

- Add the missing values to the DataFrame. For example, make the value of the quantity in the fourth row equal to NaN.
- Find all the rows with missing values.

**Filling in missing values:**

- Fill in the missing values in the 'Quantity' column with the average value for this column.
- Fill in the missing values in the 'Quantity' column with the value 10.

**Creating new columns:**

- Add a new column 'Total Price', which will be the product of the columns 'Quantity' and 'Unit Price'.
- Add a new 'Discount' column with a default value of 0.

**Deleting data:**

- Delete all lines where the number is less than 10.
- Delete the 'City' column from the DataFrame.

**Applying functions to columns:**

- Write a function that increases the unit price by 10% and apply it to the 'Unit Price' column.
- Apply the function that calculates the length of the product name to the 'Product' column and create a new 'Product Name Length' column.

**DataFrame connection:**

- Create a new DataFrame with additional product information:
  ```python
  extra_data = {
      'Product': ['A', 'B', 'C', 'D'],
      'Category': ['Electronics', 'Clothing', 'Electronics', 'Furniture']
  }
  extra_df = pd.DataFrame(extra_data)


In [1189]:
import pandas as pd

data = {
    'Date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04', '2024-01-05'],
    'Product': ['A', 'B', 'A', 'C', 'B'],
    'Quantity': [10, 15, 8, 12, 20],
    'Unit Price': [100, 150, 100, 200, 150],
    'City': ['Moscow', 'Saint Petersburg', 'Moscow', 'Yekaterinburg', 'Moscow']
}

df = pd.DataFrame(data)

In [1190]:
df.head(5)

Unnamed: 0,Date,Product,Quantity,Unit Price,City
0,2024-01-01,A,10,100,Moscow
1,2024-01-02,B,15,150,Saint Petersburg
2,2024-01-03,A,8,100,Moscow
3,2024-01-04,C,12,200,Yekaterinburg
4,2024-01-05,B,20,150,Moscow


In [1191]:
df.columns

Index(['Date', 'Product', 'Quantity', 'Unit Price', 'City'], dtype='object')

In [1192]:
df[df["City"] == "Moscow"]

Unnamed: 0,Date,Product,Quantity,Unit Price,City
0,2024-01-01,A,10,100,Moscow
2,2024-01-03,A,8,100,Moscow
4,2024-01-05,B,20,150,Moscow


In [1193]:
df[df["Product"] == "A"]

Unnamed: 0,Date,Product,Quantity,Unit Price,City
0,2024-01-01,A,10,100,Moscow
2,2024-01-03,A,8,100,Moscow


In [1194]:
df.sort_values("Date")

Unnamed: 0,Date,Product,Quantity,Unit Price,City
0,2024-01-01,A,10,100,Moscow
1,2024-01-02,B,15,150,Saint Petersburg
2,2024-01-03,A,8,100,Moscow
3,2024-01-04,C,12,200,Yekaterinburg
4,2024-01-05,B,20,150,Moscow


In [1195]:
df.sort_values("Quantity", ascending=False).head(3)

Unnamed: 0,Date,Product,Quantity,Unit Price,City
4,2024-01-05,B,20,150,Moscow
1,2024-01-02,B,15,150,Saint Petersburg
3,2024-01-04,C,12,200,Yekaterinburg


In [1196]:
df.groupby("City").sum("Quantity")

Unnamed: 0_level_0,Quantity,Unit Price
City,Unnamed: 1_level_1,Unnamed: 2_level_1
Moscow,38,350
Saint Petersburg,15,150
Yekaterinburg,12,200


In [1197]:
df["Total"] = df["Quantity"] * df["Unit Price"]
df.groupby("Product")["Total"].sum()

Product
A    1800
B    5250
C    2400
Name: Total, dtype: int64

In [1198]:
df.loc[3, "Quantity"] = float("NaN")
df

Unnamed: 0,Date,Product,Quantity,Unit Price,City,Total
0,2024-01-01,A,10.0,100,Moscow,1000
1,2024-01-02,B,15.0,150,Saint Petersburg,2250
2,2024-01-03,A,8.0,100,Moscow,800
3,2024-01-04,C,,200,Yekaterinburg,2400
4,2024-01-05,B,20.0,150,Moscow,3000


In [1199]:
df[df.isnull().any(axis=1)]

Unnamed: 0,Date,Product,Quantity,Unit Price,City,Total
3,2024-01-04,C,,200,Yekaterinburg,2400


In [1200]:
df["Quantity"] = df["Quantity"].fillna(df["Quantity"].mean())
df

Unnamed: 0,Date,Product,Quantity,Unit Price,City,Total
0,2024-01-01,A,10.0,100,Moscow,1000
1,2024-01-02,B,15.0,150,Saint Petersburg,2250
2,2024-01-03,A,8.0,100,Moscow,800
3,2024-01-04,C,13.25,200,Yekaterinburg,2400
4,2024-01-05,B,20.0,150,Moscow,3000


In [1201]:
df = df.query("Quantity >= 10")


In [1202]:
df = df.copy()
df.loc[:,"Total Price"] = df["Quantity"] * df["Unit Price"]
df

Unnamed: 0,Date,Product,Quantity,Unit Price,City,Total,Total Price
0,2024-01-01,A,10.0,100,Moscow,1000,1000.0
1,2024-01-02,B,15.0,150,Saint Petersburg,2250,2250.0
3,2024-01-04,C,13.25,200,Yekaterinburg,2400,2650.0
4,2024-01-05,B,20.0,150,Moscow,3000,3000.0


In [1203]:
df = df.drop("City", axis=1)


In [1204]:
df["Unit Price"] = df["Unit Price"] + df["Unit Price"] * 0.1
df

Unnamed: 0,Date,Product,Quantity,Unit Price,Total,Total Price
0,2024-01-01,A,10.0,110.0,1000,1000.0
1,2024-01-02,B,15.0,165.0,2250,2250.0
3,2024-01-04,C,13.25,220.0,2400,2650.0
4,2024-01-05,B,20.0,165.0,3000,3000.0


In [1205]:
df.loc[:, "Product Name Length"] = df["Product"].apply(len)


In [1206]:
df

Unnamed: 0,Date,Product,Quantity,Unit Price,Total,Total Price,Product Name Length
0,2024-01-01,A,10.0,110.0,1000,1000.0,1
1,2024-01-02,B,15.0,165.0,2250,2250.0,1
3,2024-01-04,C,13.25,220.0,2400,2650.0,1
4,2024-01-05,B,20.0,165.0,3000,3000.0,1


In [1207]:
extra_data = {
    'Product': ['A', 'B', 'C', 'D'],
    'Category': ['Electronics', 'Clothing', 'Electronics', 'Furniture']
}
extra_df = pd.DataFrame(extra_data)

In [1208]:
merged_df = pd.merge(df, extra_df, on="Product")

In [1209]:
merged_df

Unnamed: 0,Date,Product,Quantity,Unit Price,Total,Total Price,Product Name Length,Category
0,2024-01-01,A,10.0,110.0,1000,1000.0,1,Electronics
1,2024-01-02,B,15.0,165.0,2250,2250.0,1,Clothing
2,2024-01-04,C,13.25,220.0,2400,2650.0,1,Electronics
3,2024-01-05,B,20.0,165.0,3000,3000.0,1,Clothing
