## <b> NB02 - Raininess Index </b>

My Raininiess Index (RI) will be composed of the variable <b> rain sum </b>, as I am trying to see whether London is as rainy as the movies make it out to be (and in general, how rainy it is compared to other rainy cities). So, using rain sum to decide on the raininess makes more sense than using a variable such as precipitation sum, which also includes snow.

#### This will help me make deductions regarding my two hypotheses:

<b> H0: </b> London is no different than any other generally-rainy city, in terms of raininess

<b> H1: </b> London is as rainy as the movies make it out to be, defined by how it compares to other cities on raininess

In [222]:
## Importing packages
import pandas as pd
import os
import json
import requests
import numpy as np

from functions import *

from lets_plot import *
LetsPlot.setup_html()

import geopandas as gpd

from lets_plot.geo_data import *

import geodatasets

from IPython.display import HTML

In [223]:
## Reading London data and renaming rain_sum as raininess (as a test-run)
daily_rain_df = pd.read_json('../data/daily_rain.json')

daily_rain_df = daily_rain_df.rename(columns={'rain_sum': 'raininess'})

daily_rain_df.to_csv('../data/london_rain.csv', index=False)

In [224]:
## Reading the complete city data and renaming rain_sum as raininess
all_city_rain_df = pd.read_csv('../data/historical_city_rain_data.csv')

all_city_rain_df = all_city_rain_df.rename(columns={'rain_sum': 'raininess'})

all_city_rain_df.to_csv('../data/historical_city_rain_data.csv', index=False)

In [225]:
## Converting the data into a pandas table and calling the first 5 rows
data = pd.read_csv('../data/historical_city_rain_data.csv')

df = pd.DataFrame(data)

df.head()

Unnamed: 0,country,city,date,raininess
0,GB,London,2021-01-01,0.4
1,GB,London,2021-01-02,0.0
2,GB,London,2021-01-03,0.6
3,GB,London,2021-01-04,2.3
4,GB,London,2021-01-05,2.1


In [226]:
df['date'] = pd.to_datetime(df['date'])

df = df.set_index('date')

## Grouping by city and resampling to get monthly raininess means
df = df.groupby('city').resample('ME')['raininess'].mean().reset_index()

## Converting the date column to datetime
df['date'] = pd.to_datetime(df['date'])

## Extracting month names
df['month_name'] = df['date'].dt.month_name()

## Dropping the date column because I only want months
df.drop(columns=['date'], inplace=True)

## Renaming the 'mean_raininess' column to 'monthly mean raininess'
df.rename(columns={'raininess': 'monthly_mean_raininess'}, inplace=True)

## Here, I've tested an seen that I require years too, so:
## Creating a year list
year = [2021] * 12 + [2022] * 12 + [2023] * 12 + [2024] * 1 + [2021] * 12 + [2022] * 12 + [2023] * 12 + [2024] * 1 + [2021] * 12 + [2022] * 12 + [2023] * 12 + [2024] * 1 + [2021] * 12 + [2022] * 12 + [2023] * 12 + [2024] * 1 + [2021] * 12 + [2022] * 12 + [2023] * 12 + [2024] * 1

## Adding the year list as a new column to the dataframe
df['year'] = year

## Creating a year_month column with month names and years
df['year_month'] = pd.to_datetime(df['month_name'] + ' ' + df['year'].astype(str))

## Formatting year_month
df['year_month'] = df['year_month'].dt.strftime('%Y-%m')

## Dropping the separate month_name and year columns as they are now redundant
df.drop(columns=['month_name', 'year'], inplace=True)

## Printing df to check for errors
print(df)

        city  monthly_mean_raininess year_month
0      Kyoto                2.209677    2021-01
1      Kyoto                2.303571    2021-02
2      Kyoto                4.651613    2021-03
3      Kyoto                7.000000    2021-04
4      Kyoto                9.970968    2021-05
..       ...                     ...        ...
180  Seattle                2.620000    2023-09
181  Seattle                2.754839    2023-10
182  Seattle                4.773333    2023-11
183  Seattle                6.887097    2023-12
184  Seattle                0.000000    2024-01

[185 rows x 3 columns]


  df['year_month'] = pd.to_datetime(df['month_name'] + ' ' + df['year'].astype(str))


In [227]:
df.to_csv('../data/monthly_avg.csv', index=False)
monthly_avg = pd.read_csv('/files/ds105a-2024-w06-summative-deyavuz/data/monthly_avg.csv')

In [228]:
monthly_avg['year_month'] = pd.to_datetime(monthly_avg['year_month'], format='%Y-%m', errors='coerce')

print(monthly_avg[monthly_avg['year_month'].isnull()])

monthly_avg.sort_values('year_month', inplace=True)

Empty DataFrame
Columns: [city, monthly_mean_raininess, year_month]
Index: []


In [245]:
ggplot(monthly_avg, aes(x='year_month', y='monthly_mean_raininess', group='city')) + \
    geom_line(aes(color='city'), size=1, alpha=0.5) + \
    xlab('Year and Month') + \
    ylab('Monthly Mean Raininess') + \
        ggtitle('Monthly Mean Raininess for Each City Between 2021-2024')

In [232]:
## Calculating the mean raininess of each city, to map out the cities' raininess
df = pd.read_csv('../data/historical_city_rain_data.csv')

mean_raininess_per_city = df.groupby('city')['raininess'].mean().reset_index()

mean_raininess_per_city.columns = ['city', 'mean_raininess']

print(mean_raininess_per_city)

      city  mean_raininess
0    Kyoto        5.004197
1   London        2.016423
2   Munich        2.731934
3     Oslo        2.245985
4  Seattle        2.977190


In [233]:
## Saving mean raininess data
mean_raininess_per_city.to_csv('../data/mean_raininess_per_city.csv', index=False)

In [234]:
## Plotting mean raininess data on a livemap
data = pd.read_csv('/files/ds105a-2024-w06-summative-deyavuz/data/mean_raininess_per_city.csv')

centroids = geocode_cities(data["city"]).get_centroids()

lats = [0 * y for y in range(4)]

p = ggplot() + ggsize(800, 500)

plot = (
    p + 
    geom_livemap(zoom=2.75) +
    geom_hline(aes(yintercept=lats), color='#e0218a', linetype=2, size=1) +
    geom_point(aes(size='mean_raininess', color='city'), 
               show_legend=True,  
               data=data,
               map=centroids, 
               map_join="city", 
               tooltips=layer_tooltips().title("@city"))
)

plot

To analyze the data, I will be ranking the cities by:
- every single day, number 1 in raininess

And then:
- calculating how many times London is number 1

Click [here](https://github.com/lse-ds105/ds105a-2024-w06-summative-deyavuz/blob/58b6e8a429c921efab90234ebc7dba23f7571c37/code/NB03-Rankings.ipynb) to navigate to NB03 - Rankings!