# Assessment Problem
Write a Python script that processes sensor data from a zip file (`sensor_data_2025.zip`). The script should:
1. Extract data from nested city zip files (e.g., `munich_data.zip`, `tokyo_data.zip`), reading ``readings.csv`` and `station_info.json` from each.
2. Add a `city` column to each DataFrame with the city's name.
3. Combine the data into two main DataFrames: one for readings and one for station info.
4. Merge these two DataFrames on `station_id`.
5. Clean the merged data by keeping only `active` stations and removing rows with missing `temperature_celsius` or `pm2_5`.
6. Calculate the average temperature and PM2.5 for each city.
7. Save the city summary as a CSV file named `city_summary_report.csv` with columns `city`, `average_temperature`, and `average_pm2_5`.

## TASK 1: Load and combine data

Go through each city's zip file, read the `readings.csv` and `station_info.json` files, add a 'city' column to each DataFrame, and combine them into `all_readings_df` and `all_station_info_df`.


**Reasoning**:
TASK 1 requires extracting data from nested zip files, processing them, and combining them into two dataframes. This involves using the zipfile library and pandas for data manipulation.



In [3]:
#unzipping given sensor_data_2025
import zipfile
import pandas as pd

all_readings_df =[]
all_station_info_df=[]

def unzip(zip_file):
    with zipfile.ZipFile(zip_file,'r') as zf:
        zf.extractall()
        files = zf.namelist()
        if zip_file.endswith('.zip'):
            city_name = zip_file.split('_')[0]
            for f in files:
                if f.endswith('.json'):
                    df_json = pd.read_json(f)
                    df_json['city'] = city_name
                    all_readings_df.extend(df_json)
                
                if f.endswith('.csv'):
                    df_csv = pd.read_csv(f)
                    df_csv['city'] = city_name
                    all_station_info_df.extend(df_csv)
            
        return files

inner_zip = unzip("sensor_data_2025.zip")

for zip_file in inner_zip:
    unzip(zip_file)


print(all_readings_df)
print(all_station_info_df)


['station_id', 'location', 'status', 'deployment_year', 'city', 'station_id', 'location', 'status', 'deployment_year', 'city', 'station_id', 'location', 'status', 'deployment_year', 'city']
['station_id', 'timestamp', 'temperature_celsius', 'humidity_percent', 'pm2_5', 'city', 'station_id', 'timestamp', 'temperature_celsius', 'humidity_percent', 'pm2_5', 'city', 'station_id', 'timestamp', 'temperature_celsius', 'humidity_percent', 'pm2_5', 'city']


## TASK 2: Merge dataframes

Merge the `all_readings_df` and `all_station_info_df` into a single DataFrame using the `station_id` column.


**Reasoning**:
Merge the two dataframes `all_readings_df` and `all_station_info_df` on the `station_id` column and display the head of the resulting dataframe to verify the merge.



In [None]:
# WRITE YOUR CODE HERE
#clean and merge

merged_data = pd.merge(all_readings_df, all_station_info_df, on = "station_id")
merged_data

#clean

merged_data = merged_data[merged_data.status == 'active']

#remove
merged_data.dropna(subset=['temperature_celsius', 'pm2_5'], how ='all' , inplace = True)

#final summary
final_summ


## TASK 3: Clean data

Filter the merged DataFrame to keep only 'active' stations and remove rows with missing values in 'temperature_celsius' or 'pm2_5'.


**Reasoning**:
Filter the merged dataframe to keep only active stations and remove rows with missing values in the specified columns.



In [1]:
# WRITE YOUR CODE HERE

## TASK 4: Analyze data

Calculate the average 'temperature_celsius' and 'pm2_5' for each city.


**Reasoning**:
Calculate the mean temperature and PM2.5 for each city by grouping the cleaned_df by city and applying the mean aggregation. Then display the head of the resulting dataframe.



In [2]:
# WRITE YOUR CODE HERE

## TASK 5: Save report

Save the city summary as a CSV file named `city_summary_report.csv` with the specified columns.


**Reasoning**:
Reset the index, rename the city column, and save the city summary DataFrame to a CSV file.



In [3]:
# WRITE YOUR CODE HERE