# Assessment Problem
Write a Python script that processes sensor data from a zip file (`sensor_data_2025.zip`). The script should:
1. Extract data from nested city zip files (e.g., `munich_data.zip`, `tokyo_data.zip`), reading ``readings.csv`` and `station_info.json` from each.
2. Add a `city` column to each DataFrame with the city's name.
3. Combine the data into two main DataFrames: one for readings and one for station info.
4. Merge these two DataFrames on `station_id`.
5. Clean the merged data by keeping only `active` stations and removing rows with missing `temperature_celsius` or `pm2_5`.
6. Calculate the average temperature and PM2.5 for each city.
7. Save the city summary as a CSV file named `city_summary_report.csv` with columns `city`, `average_temperature`, and `average_pm2_5`.

## TASK 1: Load and combine data

Go through each city's zip file, read the `readings.csv` and `station_info.json` files, add a 'city' column to each DataFrame, and combine them into `all_readings_df` and `all_station_info_df`.


**Reasoning**:
TASK 1 requires extracting data from nested zip files, processing them, and combining them into two dataframes. This involves using the zipfile library and pandas for data manipulation.



In [3]:


#unzip sensor_data
'''import zipfile
def unzip(filename):
    with zipfile.ZipFile(filename,'r') as zf:
        inner_list=zf.namelist()
        zf.extractall()
        return inner_list
inner_list=unzip("sensor_data_2025.zip")
for file in inner_list:
    unzip(file)

#extracting all files
with zipfile.ZipFile("munich_data.zip",'r') as zf:
        zf.extractall()

#extracting all files in tokyo_data
with zipfile.ZipFile("tokyo_data.zip",'r') as z:
        z.extractall()
        print(z.namelist())

import pandas as pd

reading=pd.read_csv("readings.csv")
station_info=pd.read_json("station_info.json")
station_info
station_info['city']=station_info.groupby('city name')
reading['city']=reading.groupby('city name')
'''

import zipfile
import pandas as pd

def unzip(filename):
    with zipfile.ZipFile(filename, 'r') as zf:
        inner_list = zf.namelist()
        zf.extractall()
        return inner_list
inner_list = unzip("sensor_data_2025.zip")
for file in inner_list:
    unzip(file)

cities = ["munich", "tokyo"]

all_readings_df = pd.DataFrame()
all_station_info_df = pd.DataFrame()
for city in cities:
    reading = pd.read_csv("readings.csv")
    station_info = pd.read_json("station_info.json")
    reading['city'] = city
    station_info['city'] = city
    all_readings_df = pd.concat([all_readings_df, reading], ignore_index=True)
    all_station_info_df = pd.concat([all_station_info_df, station_info], ignore_index=True)
all_readings_df
all_station_info_df






   

Unnamed: 0,station_id,location,status,deployment_year,city
0,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich
1,1012,"{'latitude': -33.863872, 'longitude': 151.151055}",active,2023,munich
2,1013,"{'latitude': -33.876501, 'longitude': 151.137366}",active,2018,munich
3,1014,"{'latitude': -33.865011, 'longitude': 151.180531}",active,2020,munich
4,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,munich
5,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,tokyo
6,1012,"{'latitude': -33.863872, 'longitude': 151.151055}",active,2023,tokyo
7,1013,"{'latitude': -33.876501, 'longitude': 151.137366}",active,2018,tokyo
8,1014,"{'latitude': -33.865011, 'longitude': 151.180531}",active,2020,tokyo
9,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,tokyo


## TASK 2: Merge dataframes

Merge the `all_readings_df` and `all_station_info_df` into a single DataFrame using the `station_id` column.


**Reasoning**:
Merge the two dataframes `all_readings_df` and `all_station_info_df` on the `station_id` column and display the head of the resulting dataframe to verify the merge.



In [4]:
merge=pd.merge(all_station_info_df,all_readings_df,on='station_id')
merge

Unnamed: 0,station_id,location,status,deployment_year,city_x,timestamp,temperature_celsius,humidity_percent,pm2_5,city_y
0,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich,2025-02-15 04:00:00,21.86,76.34,59.35,munich
1,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich,2025-02-28 13:00:00,18.90,50.89,21.21,munich
2,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich,2025-02-09 20:00:00,29.05,51.06,59.63,munich
3,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich,2025-01-25 13:00:00,26.43,65.09,39.54,munich
4,1011,"{'latitude': -33.807421, 'longitude': 151.130746}",active,2022,munich,2025-03-16 15:00:00,29.30,46.26,7.25,munich
...,...,...,...,...,...,...,...,...,...,...
1995,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,tokyo,2025-03-01 01:00:00,25.67,52.59,50.15,tokyo
1996,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,tokyo,2025-01-26 08:00:00,28.82,30.79,48.25,tokyo
1997,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,tokyo,2025-03-23 21:00:00,24.31,75.22,38.13,tokyo
1998,1015,"{'latitude': -33.812949, 'longitude': 151.165912}",maintenance,2021,tokyo,2025-01-29 13:00:00,19.50,49.30,60.75,tokyo


## TASK 3: Clean data

Filter the merged DataFrame to keep only 'active' stations and remove rows with missing values in 'temperature_celsius' or 'pm2_5'.


**Reasoning**:
Filter the merged dataframe to keep only active stations and remove rows with missing values in the specified columns.



In [5]:
merge[merge["status"]=="active"]
merge=merge.dropna(subset=['temperature_celsius','pm2_5'],how='any')

## TASK 4: Analyze data

Calculate the average 'temperature_celsius' and 'pm2_5' for each city.


**Reasoning**:
Calculate the mean temperature and PM2.5 for each city by grouping the cleaned_df by city and applying the mean aggregation. Then display the head of the resulting dataframe.



In [6]:

mean_temperature=all_readings_df.groupby('city')[['temperature_celsius','pm2_5']].agg(['mean'])
mean_temperature.head()

Unnamed: 0_level_0,temperature_celsius,pm2_5
Unnamed: 0_level_1,mean,mean
city,Unnamed: 1_level_2,Unnamed: 2_level_2
munich,24.16083,39.20103
tokyo,24.16083,39.20103


## TASK 5: Save report

Save the city summary as a CSV file named `city_summary_report.csv` with the specified columns.


**Reasoning**:
Reset the index, rename the city column, and save the city summary DataFrame to a CSV file.



In [7]:
city_summary_report=pd.concat([merge,mean_temperature],axis=1)
city_summary_report.to_csv("city_summary_report.csv",index=False)


In [8]:
#linear
def linear(arr,target):
    for i in range(len(arr)):
        if arr[i]==target:
            return i
    return -1
nums=[2,1,3,4]
target=30
result=linear(nums,target)

if result !=-1:
    print(f"Target{target} found at {"result"}") 
else:
    print("Target not found")
    
#bubble sort
def bubble(arr):
    for i in range(len(arr)):
        for j in range(len(arr)-1):
            if arr[j]>arr[j+1]:
                arr[j],arr[j+1]=arr[j+1],arr[j]
nums=[2,34,12,3]
bubble(nums)
print(nums)
            


                                                 

Target not found
[2, 3, 12, 34]
