# Weather Data Analysis Project

This notebook provides a guide to loading, processing, and analyzing weather data to calculate statistics such as average temperatures and humidity.

---


In [1]:
import csv
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


## Data Loading

Read the raw weather data using the `read_csv(file_path)` function.


In [2]:
def read_csv(file_path):
    with open(file_path, 'r') as f:
        reader = csv.reader(f)
        data = []
        for row in reader:
            data.append(row)
    return data

# For demonstration purposes, we create sample data using StringIO
from io import StringIO

csv_data = """Date,Temperature,Humidity
2018-01-01,30,45
2018-01-02,32,50
2018-01-03,31,55
2018-01-04,29,60
2018-01-05,28,65
"""

# Simulate reading from a file
data_2018 = read_csv(StringIO(csv_data))

# Display the data
print(data_2018)


## Data Processing

Process the weather data to calculate average temperature and humidity using the `process_weather_data(data)` function.


In [3]:
def process_weather_data(data):
    total_temp = 0
    total_humidity = 0
    count = 0
    # Skip header row
    for row in data[1:]:
        total_temp += int(row[1])
        total_humidity += int(row[2])
        count += 1

    avg_temp = total_temp / count
    avg_humidity = total_humidity / count

    print(f"Average Temperature: {avg_temp}")
    print(f"Average Humidity: {avg_humidity}")

process_weather_data(data_2018)


## Statistical Analysis

Calculate the minimum and maximum temperatures using the `get_min_max(data)` and `calculate_statistics(data)` functions.


In [4]:
def get_min_max(data):
    min_temp = float('inf')
    max_temp = float('-inf')
    for row in data[1:]:  # Skip header
        try:
            temp = int(row[1])
            if temp < min_temp:
                min_temp = temp
            if temp > max_temp:
                max_temp = temp
        except (ValueError, IndexError):
            # Skip rows with invalid data
            continue
    return min_temp, max_temp

def calculate_statistics(data):
    min_temp, max_temp = get_min_max(data)
    print(f"Min Temperature: {min_temp}")
    print(f"Max Temperature: {max_temp}")

calculate_statistics(data_2018)


## Data Transformation and Utilities

Use the `to_float(value)` function for converting data into numeric form.


In [5]:
def to_float(value):
    try:
        return float(value)
    except:
        return "N/A"

# Example usage
values = ["42", "not_a_number", "3.14", ""]
converted_values = [to_float(v) for v in values]
print(converted_values)


## Data Export

Save the processed data into `data/processed/weather_stats_final.csv`.


In [6]:
# Prepare processed data
processed_data = {
    'Average Temperature': [sum(int(row[1]) for row in data_2018[1:]) / len(data_2018[1:])],
    'Average Humidity': [sum(int(row[2]) for row in data_2018[1:]) / len(data_2018[1:])]
}

df_processed = pd.DataFrame(processed_data)
print(df_processed)

# Save to CSV (uncomment the following line to save the file)
# df_processed.to_csv('data/processed/weather_stats_final.csv', index=False)


## Visualization

Create plots to visualize average temperature and humidity over time.


In [7]:
# Convert data to DataFrame
df = pd.DataFrame(data_2018[1:], columns=data_2018[0])
df['Temperature'] = pd.to_numeric(df['Temperature'])
df['Humidity'] = pd.to_numeric(df['Humidity'])
df['Date'] = pd.to_datetime(df['Date'])

# Plot Temperature over Time
plt.figure(figsize=(10,5))
plt.plot(df['Date'], df['Temperature'], marker='o')
plt.title('Temperature Over Time')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.grid(True)
plt.show()

# Plot Humidity over Time
plt.figure(figsize=(10,5))
plt.plot(df['Date'], df['Humidity'], marker='o', color='orange')
plt.title('Humidity Over Time')
plt.xlabel('Date')
plt.ylabel('Humidity')
plt.grid(True)
plt.show()


## Conclusions and Future Work

This notebook demonstrates loading, processing, and analyzing weather data to calculate average temperatures and humidity, as well as finding minimum and maximum temperatures. Future improvements could include robust error handling for missing or malformed data and incorporating additional datasets for more comprehensive analysis.

---

*End of Notebook*