<a href="https://colab.research.google.com/github/SallyPeter/gomycodeDSbootcamp/blob/main/Python/Checkpoint_Plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **What You're Aiming For**

In this checkpoint, we are going to work on the 'Climate change in Africa' dataset that was provided by the U.S global change research program.

**Dataset description :** This dataset contains historical data about the daily min, max and average temperature fluctuation in 5 African countries (Egypt, Tunisia, Cameroon, Senegal, Angola) between 1980 and 2023.

➡️ Dataset link

https://i.imgur.com/w2czdso.jpg


## **Instructions**

1. Load the dataset into a data frame using Python.
2. Clean the data as needed.
3. Plot a line chart to show the average temperature fluctuations in Tunisia and Cameroon. Interpret the results.
4. Zoom in to only include data between 1980 and 2005, try to customize the axes labels.
5. Create Histograms to show temperature distribution in Senegal between [1980,2000] and [2000,2023] (in the same figure). Describe the obtained results.
6. Select the best chart to show the Average temperature per country.
7. Make your own questions about the dataset and try to answer them using the appropriate visuals.


In [None]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
# Load the dataset into a data frame using Python.
data = pd.read_csv('Africa_climate_change.csv')
data.head()

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
# Clean the data as needed.

# Handling missing data
for each in ['TAVG', 'TMAX', 'TMIN']:
  data[each] = data[each].fillna(data[each].mean())

data['PRCP'] = data['PRCP'].fillna(data['PRCP'].mode()[0])

data.info()

In [None]:
#Handling wrong data type

data.DATE = pd.to_datetime(data['DATE'])

data.head(2)

In [None]:
type(data.DATE[0])

In [None]:
data[data['TAVG'] < 0]

**There is only one occurence of a negative number in the TAVG column which looks like an error considering the difference between the TMAX and the TMIN column values for the same record. Hence we would convert it to a positive number.**

In [None]:
data.loc[460304, 'TAVG'] = abs(data.loc[460304, 'TAVG']) #-- can also be done with this --- data.at[460304, 'TAVG'] = abs(data.at[460304, 'TAVG'])
data.iloc[460304]['TAVG']

In [None]:
# Plot a line chart to show the average temperature fluctuations in Tunisia and Cameroon. Interpret the results.
mask = data['COUNTRY'].isin(['Tunisia', 'Cameroon'])
tun_cam = data[mask]
tun_cam.head()

In [None]:
fig = px.line(tun_cam, x='DATE', y='TAVG', color='COUNTRY')
fig.show()

In [None]:
data.DATE.max()#.month

In [None]:
# Zoom in to only include data between 1980 and 2005, try to customize the axes labels.
import datetime

start_date = datetime.datetime(1980, 1, 1) #'1980-01-01 00:00:00'
end_date = datetime.datetime(2005, 12, 31) #'2005-12-31 00:00:00'

fig.update_xaxes(type="date", range=[start_date, end_date], title= f'Date Zoomed in to {start_date.year} and {end_date.year}')
fig.update_layout(title = "Average temperature fluctuations in Tunisia and Cameroon", title_x = 0.5 )

**The line charts above indicates the average temperature fluctuations in Tunisia and Cameroon through the years. It is visible and can be intepreted that Tunisia experience a wider range of temperature fluctuations falling close to 40 degrees and rising up to 90 degrees. On the other hand the average temperature in cameroon is seen to lie often between 70 and 80 degrees**

In [None]:
# Create Histograms to show temperature distribution in Senegal between [1980,2000] and [2000,2023] (in the same figure). Describe the obtained results.
sen = data[data.COUNTRY == 'Senegal']
sen['YEAR'] = sen.DATE.dt.year
sen = sen[sen.YEAR <= 2023]
sen.tail()

In [None]:
sen_1 = sen[sen.YEAR <= 2000]
sen_2 = sen[sen.YEAR > 2000]

In [None]:
fig = make_subplots( rows = 1, cols=2, specs=[
    [{"type":"histogram"}, {"type":"histogram"}]],
                     subplot_titles=("Avg. Temp distribution in Senegal 1980 - 2000", "Avg. Temp distribution in Senegal 2001 - 2023"))


# fig.add_trace(go.Histogram(x = sen_1.TAVG, nbinsx=20), row=1, col=1)
# fig.add_trace(go.Histogram(x = sen_2.TAVG, nbinsx=20), row=1, col=2)

fig.add_trace(go.Histogram(x = sen_1.TAVG), row=1, col=1)
fig.add_trace(go.Histogram(x = sen_2.TAVG), row=1, col=2)

fig.update_xaxes(title_text="Average Temperature Values", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=1)

fig.update_xaxes(title_text="Average Temperature Values", row=1, col=2)
fig.update_yaxes(title_text="Count", row=1, col=2)

fig.show()

**The distribution of the Average temperature between 1980 and 2000 in Senegal lied mostly between 75 degrees and 90 degrees. Though this temperature in itself is hot, it is also visibly clear that the temperatures did rise between 2001 and 2023 as the average temperature difference leaned more towards 95 degrees with outliers jumping above 100 degrees**

In [None]:
# Select the best chart to show the Average temperature per country.

fig = px.bar(data, x='COUNTRY', y='TAVG', title='Average temperature per country', color='COUNTRY')
fig.show()

In [None]:
data.head()

In [None]:
data.COUNTRY.unique()

In [None]:
# Make your own questions about the dataset and try to answer them using the appropriate visuals.



In [None]:
px.histogram(data[data.COUNTRY == 'Tunisia']['PRCP'] )

In [None]:
# What is the distribution of precipitation (PRCP) in each country?

fig = make_subplots( rows = 3, cols=2, specs=[
    [{"type":"histogram"}, {"type":"histogram"}],
     [{"type":"histogram"}, {"type":"histogram"}],
      [{"type":"histogram"}, None]] ,
                     subplot_titles=("Distribution of PRCP for Tunisia", "Distribution of PRCP for Cameroon", "Distribution of PRCP for Senegal", "Distribution of PRCP for Egypt", "Distribution of PRCP for Angola", None))


fig.add_trace(go.Histogram(x = data[data.COUNTRY == 'Tunisia']['PRCP']), row=1, col=1)
fig.add_trace(go.Histogram(x = data[data.COUNTRY == 'Cameroon']['PRCP']), row=1, col=2)
fig.add_trace(go.Histogram(x = data[data.COUNTRY == 'Senegal']['PRCP']), row=2, col=1)
fig.add_trace(go.Histogram(x = data[data.COUNTRY == 'Egypt']['PRCP']), row=2, col=2)
fig.add_trace(go.Histogram(x = data[data.COUNTRY == 'Angola']['PRCP']), row=3, col=1)


fig.update_xaxes(title_text="PRCP Values", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=1)

fig.update_xaxes(title_text="PRCP Values", row=1, col=2)
fig.update_yaxes(title_text="Count", row=1, col=2)


fig.update_xaxes(title_text="PRCP Values", row=2, col=1)
fig.update_yaxes(title_text="Count", row=2, col=1)

fig.update_xaxes(title_text="PRCP Values", row=2, col=2)
fig.update_yaxes(title_text="Count", row=2, col=2)


fig.update_xaxes(title_text="PRCP Values", row=3, col=1)
fig.update_yaxes(title_text="Count", row=3, col=1)


fig.show()

Not as informative as expected. But this goes to show that for each of the country the PRCP values that occurred the most was 0

In [None]:
data.columns

In [None]:
# What are the maximum and minimum temperatures (TMAX and TMIN) for each country over time?

# What is the distribution of precipitation (PRCP) in each country?

fig = make_subplots( rows = 5, cols=2, specs=[
    [{"type":"scatter"}, {"type":"scatter"}],
     [{"type":"scatter"}, {"type":"scatter"}],
    [{"type":"scatter"}, {"type":"scatter"}],
    [{"type":"scatter"}, {"type":"scatter"}],
    [{"type":"scatter"}, {"type":"scatter"}]
      ] ,
    subplot_titles=("Distribution of TMAX Temperatures for Tunisia", "Distribution of TMin Temperatures for Tunisia", "Distribution of TMAX Temperatures for Cameroon", "Distribution of TMIN Temperatures for Cameroon",
                    "Distribution of TMAX Temperatures for Senegal", "Distribution of TMIN Temperatures for Senegal",  "Distribution of TMAX Temperatures for Egypt", "Distribution of TMIN Temperatures for Egypt",
                    "Distribution of TMAX Temperatures for Angola", "Distribution of TMIN Temperatures for Angola"))

tun_data = data[data.COUNTRY == 'Tunisia']
cam_data = data[data.COUNTRY == 'Cameroon']
sen_data = data[data.COUNTRY == 'Senegal']
egy_data = data[data.COUNTRY == 'Egypt']
ang_data = data[data.COUNTRY == 'Angola']

fig.add_trace(go.Scatter( x = tun_data['DATE'], y=tun_data['TMAX'], mode='lines'), row=1, col=1)
fig.add_trace(go.Scatter( x = tun_data['DATE'], y=tun_data['TMIN'], mode='lines'), row=1, col=2)
fig.add_trace(go.Scatter(x = cam_data['DATE'], y=cam_data['TMAX'], mode='lines'), row=2, col=1)
fig.add_trace(go.Scatter(x = cam_data['DATE'], y=cam_data['TMIN'], mode='lines'), row=2, col=2)
fig.add_trace(go.Scatter(x =sen_data[ 'DATE'], y=sen_data['TMAX'], mode='lines'), row=3, col=1)
fig.add_trace(go.Scatter(x =sen_data[ 'DATE'], y=sen_data[ 'TMIN'], mode='lines'), row=3, col=2)
fig.add_trace(go.Scatter(x = egy_data['DATE'], y=egy_data['TMAX'], mode='lines'), row=4, col=1)
fig.add_trace(go.Scatter(x = egy_data['DATE'], y=egy_data['TMIN'], mode='lines'), row=4, col=2)
fig.add_trace(go.Scatter(x = ang_data['DATE'], y=ang_data['TMAX'], mode='lines'), row=5, col=1)
fig.add_trace(go.Scatter(x = ang_data['DATE'], y=ang_data['TMIN'], mode='lines'), row=5, col=2)


fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_yaxes(title_text="Temp. Values", row=1, col=1)

fig.update_xaxes(title_text="Date", row=1, col=2)
fig.update_yaxes(title_text="Temp. Values", row=1, col=2)

fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Temp. Values", row=2, col=1)

fig.update_xaxes(title_text="Date", row=2, col=2)
fig.update_yaxes(title_text="Temp. Values", row=2, col=2)

fig.update_xaxes(title_text="Date", row=3, col=1)
fig.update_yaxes(title_text="Temp. Values", row=3, col=1)

fig.update_xaxes(title_text="Date", row=3, col=2)
fig.update_yaxes(title_text="Temp. Values", row=3, col=2)

fig.update_xaxes(title_text="Date", row=4, col=1)
fig.update_yaxes(title_text="Temp. Values", row=4, col=1)

fig.update_xaxes(title_text="Date", row=4, col=2)
fig.update_yaxes(title_text="Temp. Values", row=4, col=2)

fig.update_xaxes(title_text="Date", row=5, col=1)
fig.update_yaxes(title_text="Temp. Values", row=5, col=1)

fig.update_xaxes(title_text="Date", row=5, col=2)
fig.update_yaxes(title_text="Temp. Values", row=5, col=2)


fig.show()

In [None]:
# What is the correlation between average temperature (TAVG) and precipitation (PRCP) across different countries?
scat = px.scatter(data, x='TAVG', y='PRCP',  color='COUNTRY', opacity=0.6)
scat.show()

In [None]:
tun_data.groupby(tun_data['DATE'].dt.month)[['TMAX','TMIN']].mean()

In [None]:
# How do temperature extremes (TMAX and TMIN) vary across different months in each country?

fig = make_subplots( rows = 3, cols=2, specs=[
    [{"type":"bar"}, {"type":"bar"}],
     [{"type":"bar"}, {"type":"bar"}],
      [{"type":"bar"}, None]] ,
                     subplot_titles=("Average Monthly Temperatures for Tunisia", "Average Monthly Temperatures for Cameroon", "Average Monthly Temperatures for Senegal", "Average Monthly Temperatures for Egypt",
                                     "Average Monthly Temperatures for Angola", None))

avg_tun = tun_data.groupby(tun_data['DATE'].dt.month)[['TMAX','TMIN']].mean()
avg_cam = cam_data.groupby(cam_data['DATE'].dt.month)[['TMAX','TMIN']].mean()
avg_sen = sen_data.groupby(sen_data['DATE'].dt.month)[['TMAX','TMIN']].mean()
avg_egy = egy_data.groupby(egy_data['DATE'].dt.month)[['TMAX','TMIN']].mean()
avg_ang = ang_data.groupby(ang_data['DATE'].dt.month)[['TMAX','TMIN']].mean()

fig.add_trace(go.Bar( x = avg_tun.index, y=avg_tun['TMIN']), row=1, col=1)
fig.add_trace(go.Bar( x = avg_tun.index, y=avg_tun['TMAX']), row=1, col=1)
fig.add_trace(go.Bar( x = avg_cam.index, y=avg_cam['TMIN']), row=1, col=2)
fig.add_trace(go.Bar( x = avg_cam.index, y=avg_cam['TMAX']), row=1, col=2)
fig.add_trace(go.Bar( x = avg_sen.index, y=avg_sen['TMIN']), row=2, col=1)
fig.add_trace(go.Bar( x = avg_sen.index, y=avg_sen['TMAX']), row=2, col=1)
fig.add_trace(go.Bar( x = avg_egy.index, y=avg_egy['TMIN']), row=2, col=2)
fig.add_trace(go.Bar( x = avg_egy.index, y=avg_egy['TMAX']), row=2, col=2)
fig.add_trace(go.Bar( x = avg_ang.index, y=avg_ang['TMIN']), row=3, col=1)
fig.add_trace(go.Bar( x = avg_ang.index, y=avg_ang['TMAX']), row=3, col=1)


fig.update_xaxes(title_text="Month", row=1, col=1)
fig.update_yaxes(title_text="Temp. Values", row=1, col=1)

fig.update_xaxes(title_text="Month", row=1, col=2)
fig.update_yaxes(title_text="Temp. Values", row=1, col=2)


fig.update_xaxes(title_text="Month", row=2, col=1)
fig.update_yaxes(title_text="Temp. Values", row=2, col=1)

fig.update_xaxes(title_text="Month", row=2, col=2)
fig.update_yaxes(title_text="Temp. Values", row=2, col=2)

fig.update_xaxes(title_text="Month", row=3, col=1)
fig.update_yaxes(title_text="Temp. Values", row=3, col=1)


fig.update_layout(barmode='stack')
# fig.update_legends()
fig.show()

*PS:* The bottom bars are the minimum temperatures while the top ones are the maximum.

We can observe that Tunisia and Egypt had a visible change in temperature between June and September. While the other countries had a fairly steady weather each month

In [None]:
# Which country experiences the highest variability in daily temperatures (TMAX - TMIN)?  --Chat GPT Assisted

data['Temp_Diff'] = data['TMAX'] - data['TMIN']

country_variability = data.groupby('COUNTRY')['Temp_Diff'].std().reset_index()

fig = px.bar(
    country_variability,
    x='COUNTRY',
    y='Temp_Diff',
    title='Temperature Variability (TMAX - TMIN) by Country',
    labels={'Temp_Diff': 'Temperature Variability (Standard Deviation)', 'COUNTRY': 'Country'},
    color='Temp_Diff',  # Color bars based on variability
    text='Temp_Diff',  # Display the variability value on the bars
    height=500
)

# Highlight the country with the highest variability
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.show()

Tunisia has the most variability in its temperatures over the years, followed by Egypt and Senegal while Angola had the least variability.