# Data Analysis Exercises

This notebook contains tasks for practicing **pandas** basics.
Fill in the empty code cells with your own solutions.

## Task 1: Series with city names and population
- Create a `Series` with 5 cities and their population.
- Print all values.
- Find the city with the largest population.

In [1]:
import pandas as pd

data = {"London": 9_840_740,"Munich": 1_456_039, "Warsaw": 1_800_230, "Paris": 11_346_800,"Madrid": 6_810_530}
cities_population = pd.Series(data)
print(cities_population.apply(lambda x: f"{x:,}"))
print(cities_population.idxmax())

London     9,840,740
Munich     1,456,039
Warsaw     1,800,230
Paris     11,346,800
Madrid     6,810,530
dtype: object
Paris


## Task 2: DataFrame with products
- Create a `DataFrame` with 5 products: name, price, quantity.
- Add a column **Total** = price × quantity.
- Calculate the total sum of all products.

In [2]:
data2 = {
    "Apple": {"price": 1.1, "quantity": 10},
    "Cucumber": {"price": 2.3, "quantity": 15},
    "Tomato": {"price": 3.2, "quantity": 13},
    "Potato": {"price": 0.8, "quantity": 23},
    "Garlic": {"price": 4.1, "quantity": 17},
}
df = pd.DataFrame.from_dict(data2, orient="index")
df["Total"] = df["price"] * df["quantity"]
print(df["Total"].sum())
df

175.2


Unnamed: 0,price,quantity,Total
Apple,1.1,10,11.0
Cucumber,2.3,15,34.5
Tomato,3.2,13,41.6
Potato,0.8,23,18.4
Garlic,4.1,17,69.7


## Task 3: Working with CSV
- Load the file `sales.csv`.
- Show the first 5 rows.
- Calculate the average revenue (`Revenue`).

In [4]:
df = pd.read_csv("csv/sales.csv")
df.head(5)
print(f"{df["revenue"].mean():.2f}")

222.08


## Task 4 (Optional for Portfolio): Function `analyze_dataframe(df)`
- Write a function that returns a dictionary with:
  - number of rows,
  - number of columns,
  - mean values of numeric columns.
- Use it to analyze `sales.csv`.

In [5]:
from pandas import DataFrame

df = pd.read_csv("csv/sales.csv", index_col="date")

def analyze_dataframe(df: DataFrame)->dict:
    return {
        "Number of rows:":                 df.shape[0],
        "Number of columns:":              df.shape[1],
        "Mean value of numeric columns:":  df.mean(numeric_only=True)
    }
print(analyze_dataframe(df))
df

{'Number of rows:': 200, 'Number of columns:': 5, 'Mean value of numeric columns:': quantity     10.025000
price        42.821106
revenue     222.075377
dtype: float64}


Unnamed: 0_level_0,region,product,quantity,price,revenue
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-02-08,East,Product D,3,64.33,193.0
2023-02-21,East,Product B,11,26.36,290.0
2023-01-29,West,Product A,11,18.55,204.0
2023-01-15,South,Product D,18,20.06,361.0
2023-02-12,South,Product E,15,13.60,204.0
...,...,...,...,...,...
2023-02-12,South,Product E,16,7.62,122.0
2023-01-29,East,Product B,8,18.62,149.0
2023-02-05,South,Product B,4,14.00,56.0
2023-01-13,North,Product B,8,1.25,10.0
