#  Fuel Emissions

#### Hello there, 
   Throughout this project you will see different plots representing different data providing meaningfull insights for the viewer. The data we used is frfom carbon_data.xlsx, which is a combination of different sheets, representing data for each country. We have manipulated the data, worked with it, and have brought out interseting findings.

In [None]:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import seaborn as sns

The code reads data from an Excel file named "carbon_data.xlsx" using pandas. 

1. First, it opens the Excel file and stores it in the variable `data` using the `pd.ExcelFile` function.
2. Then, it reads the data from the second sheet (indexed at 1) of the Excel file into a pandas DataFrame named `carbon_data` using the `pd.read_excel` function.

This code snippet allows for easy importation and manipulation of data from Excel files into Python using pandas, enabling further analysis and visualization.


In [None]:
data = pd.ExcelFile("carbon_data.xlsx")
carbon_data = pd.read_excel(data, sheet_name=1)

In [None]:
carbon_data

The code generates a bar plot showing the average daily time spent on the internet by individuals following different diets. The data is grouped by the 'Diet' column, and the mean of the 'How Long Internet Daily Hour' column is calculated for each group. The resulting grouped data is then visualized using matplotlib.

- The x-axis represents the different diets.
- The y-axis represents the average daily time spent on the internet in hours.
- Each bar corresponds to a diet category, with the height indicating the average time spent on the internet.

The bars are color-coded for visual distinction, using shades of blue ('#016064', '#48AAAD', '#022D36','#1F456E'). Grid lines are included for better readability.

The figure size is set to (7.5, 5) inches to ensure proper visualization, and tight layout adjustment is applied to avoid overlapping elements. Finally, the plot is displayed using plt.show().


In [None]:
grouped_data = carbon_data.groupby('Diet')['How Long Internet Daily Hour'].mean().reset_index()
plt.figure(figsize=(7.5, 5))

plt.bar(grouped_data['Diet'], grouped_data['How Long Internet Daily Hour'], color=['#016064', '#48AAAD', '#022D36','#1F456E'],width=0.4)
plt.title('Average Time Spent on Internet by Diet')
plt.xlabel('Diet')
plt.ylabel('Average Time on Internet (hours)')

plt.grid(True)
plt.tight_layout()

plt.show()

The code generates a bar plot showing the average monthly distance traveled by vehicles for different transportation modes. The data is grouped by the 'Transport' column, and the mean of the 'Vehicle Monthly Distance Km' column is calculated for each group. The resulting grouped data is then visualized using matplotlib.

- The x-axis represents the transportation modes.
- The y-axis represents the average distance traveled in kilometers.
- Each bar corresponds to a transportation mode, with the height indicating the average distance traveled.

The bars are color-coded for visual distinction, using shades of blue ('#016064', '#48AAAD', '#022D36','#1F456E'). Grid lines are included for better readability.

The figure size is set to (7.5, 5) inches to ensure proper visualization, and tight layout adjustment is applied to avoid overlapping elements. Finally, the plot is displayed using plt.show().


Loading data from sheet

In [None]:
grouped_data = carbon_data.groupby('Transport')['Vehicle Monthly Distance Km'].mean().reset_index()
plt.figure(figsize=(7.5, 5))

plt.bar(grouped_data['Transport'], grouped_data['Vehicle Monthly Distance Km'], color=['#016064', '#48AAAD', '#022D36','#1F456E'],width=0.4)
plt.title('Average Vehicle Monthly Distance Km by Transport')
plt.xlabel('Transport')
plt.ylabel('Average Km using Transport')

plt.grid(True)
plt.tight_layout()

plt.show()


The code generates a pie chart to illustrate the distribution of average monthly distances traveled by vehicles across various transportation modes.

The data in the DataFrame `carbon_data` is grouped by the transportation mode ('Transport') column, and the mean monthly distance traveled ('Vehicle Monthly Distance Km') is calculated for each group. The resulting grouped data is stored in the variable `grouped_data`.

A pie chart is created using matplotlib's `plt.pie()` function. Each slice of the pie represents a transportation mode, and its size corresponds to the average monthly distance traveled by vehicles for that mode.

The pie chart is colored using predefined color codes to enhance visual clarity.

The title of the pie chart indicates its purpose: to display the average vehicle monthly distance by transport mode.

Tight layout adjustment is applied to ensure proper spacing of plot elements.

A legend is included in the plot, positioned for optimal readability. The legend labels correspond to the different transportation modes.

Finally, the plot is displayed using `plt.show()`.

This visualization provides an intuitive representation of how the average monthly distances traveled by vehicles are distributed among different transportation modes.


In [None]:
grouped_data = carbon_data.groupby('Transport')['Vehicle Monthly Distance Km'].mean().reset_index()

plt.figure(figsize=(7.5, 5))

plt.pie(grouped_data['Vehicle Monthly Distance Km'], labels=grouped_data['Transport'], colors=[ '#F5DAD2', '#BACD92', '#75A47F'], autopct='%1.1f%%')

plt.title('Average Vehicle Monthly Distance Km by Transport')

plt.tight_layout()
plt.legend(loc="best", labels=grouped_data['Transport'])

plt.show()


In [None]:
data_1 = pd.ExcelFile("carbon_data.xlsx")
df = pd.read_excel(data_1, sheet_name=4)
df  

Taking out the data need for ploting.

In [None]:
df_1 = df[df.Area == "United States of America"][df.Item == "Energy"][df.Element == "Emissions (CO2)"]

df_2 = df[df.Area == "Russian Federation"][df.Item == "Energy"][df.Element == "Emissions (CO2)"]

df_3 = df[df.Area == "Iran (Islamic Republic of)"][df.Item == "Energy"][df.Element == "Emissions (CO2)"]

df_4 = df[df.Area == "China"][df.Item == "Energy"][df.Element == "Emissions (CO2)"]

The code below does the following:
1. Takes the data stored from 2000-2020 for each country
2. Flattens the data for plotting
3. Plots the result

In [None]:
df_1_selected = df_1[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]]
df_2_selected = df_2[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]]
df_3_selected = df_3[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]]
df_4_selected = df_4[[2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]]

x_axis = ["2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007","2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020"]

y_axis = df_1_selected.values.flatten()
y_1_axis = df_2_selected.values.flatten()
y_2_axis = df_3_selected.values.flatten()
y_3_axis = df_4_selected.values.flatten()

plt.figure(figsize=(15, 8))

plt.plot(x_axis, y_axis, color='blue', linewidth=3, linestyle='--', marker='o', label='USA')
plt.plot(x_axis, y_1_axis, color='orange', linewidth=3, linestyle='--', marker='d', label='Russia')
plt.plot(x_axis, y_2_axis, color='red', linewidth=3, linestyle='--', marker='v', label='Iran')
plt.plot(x_axis, y_3_axis, color='purple', linewidth=3, linestyle='--', marker='^', label='China')

plt.xlabel('Years') 
plt.ylabel('CO2 Emissions(Kilotonnes)')
plt.title('CO2 Emissions Over Time')

plt.xticks(rotation=45)
plt.legend()
plt.grid(True)

plt.show()

Here we calculated the average emission of CO2 amongst each countries population:

In [None]:
# Define CO2 emissions in tons and populations
co2_emissions = {
    "United States": y_axis * 1000,
    "Russian Federation": y_1_axis * 1000,
    "Iran (Islamic Republic of)" : y_2_axis * 1000,
    "China" : y_3_axis * 1000
    
}

populations = {
    "United States": 341571980,
    "Russian Federation": 144021820,
    "Iran (Islamic Republic of)": 89724761,
    "China" : 1425244312

}

# Create DataFrames for CO2 emissions
dfs = []
for country, emissions in co2_emissions.items():
    df_capita = pd.DataFrame({'Year': range(2000, 2021), 'CO2 Emission': emissions})
    total_emission = df_capita['CO2 Emission'].sum()
    mean_per_capita = total_emission / populations[country]
    dfs.append({"Country": country, "Population": populations[country], "C02 Emissions per capita(2000-2020)": mean_per_capita})

# Combine DataFrames
df_mean = pd.DataFrame(dfs)

df_mean

Here we just plot the data we ubtained from above:

In [None]:
plt.figure(figsize=(15, 8))
for country, emissions in co2_emissions.items():
    total_emission = sum(emissions)
    mean_per_capita = total_emission / populations[country]
    plt.plot(range(2000, 2021), emissions, label=country)

plt.title('CO2 Emissions per Capita (2000-2020)')
plt.xlabel('Year')
plt.ylabel('CO2 Emission per Capita (tons)')
plt.legend()
plt.grid(True)
plt.show()

A prediction for the next 10 years of CO2 emissions:

In [None]:
# Perform linear regression to predict CO2 emissions for the next 10 years for each country
predicted_emissions = {}
for country, emissions in co2_emissions.items():
    # Create DataFrame specific to the current country
    df = pd.DataFrame({'Year': range(2000, 2021), 'CO2 Emission': emissions})

    # Perform linear regression for this country's data
    X = np.array(df['Year']).reshape(-1, 1)
    y = np.array(df['CO2 Emission'])
    model = LinearRegression().fit(X, y)

    # Predict future years for this country
    future_years = np.array(range(2021, 2031)).reshape(-1, 1)
    predicted_co2 = model.predict(future_years)

    # Store predictions for the current country
    predicted_emissions[country] = predicted_co2

# Plotting the data
plt.figure(figsize=(15, 8))  # Increase figure size for better readability

# Define a color list for different countries
colors = ['b', 'g', 'r', 'c']  # Adjust colors as needed

for i, (country, emissions) in enumerate(predicted_emissions.items()):
    # Plot actual emissions with solid line and opacity
    plt.plot(range(2000, 2021), co2_emissions[country], label=f"{country} Actual", 
            color=colors[i], alpha=0.7)
    # Plot predicted emissions with dashed line and different style
    plt.plot(range(2021, 2031), emissions, linestyle='--', linewidth=2, marker='o', 
            label=f"{country} Predicted", color=colors[i])

# Customize labels and title
plt.title('CO2 Emissions Over Time (2000-2030)', fontsize=16)  # Increase font size
plt.xlabel('Year', fontsize=14)
plt.ylabel('CO2 Emission (tons)', fontsize=14)

# Rotate x-axis labels for better readability
plt.xticks(rotation=45)

# Improve legend placement and style
legend = plt.legend(title="Countries", loc='upper left')
for label in legend.get_texts():
    label.set_fontsize(12)  # Set legend text size

# Adjust grid appearance
plt.grid(True)

# Display the plot
plt.tight_layout()  # Adjust spacing between elements for better layout
plt.show()


In [None]:
data_2 = pd.ExcelFile("carbon_data.xlsx")
df = pd.read_excel(data_2, sheet_name=4)
df

Then we have to clean our data (remove dublicates, empty spaces)

In [None]:
df.dropna(inplace = True)
df.drop_duplicates(inplace = True)

Now we can start vizualising our data


### #1 Total Carbon Emmisions Trend for Armenia

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df.columns[4:], df[df['Area'] == 'Armenia'].iloc[0, 4:], marker='o', linestyle='-', color='blue', linewidth=2, markersize=8)
plt.title(f'Total Carbon Emissions Trend for Armenia ', fontsize=16)
plt.xlabel('Year', fontsize=14)
plt.ylabel('Total Carbon Emissions', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.grid(True)
plt.show()

This plot depicts the total carbon emissions trend for Armenia over a period of time. The horizontal axis represents the years, while the vertical axis indicates the corresponding total carbon emissions in metric tons.

### #2 Total Carbon Emissions by (small) Region

In [None]:
europe = ['Germany', 'France', 'Italy', 'Spain', 'United Kingdom', 'Netherlands']
asia = ['China', 'India', 'Japan', 'South Korea']
north_america = ['United States', 'Canada', 'Mexico']
regions = {'Europe': europe, 'Asia': asia, 'North America': north_america}
region_emissions = {}
for region, countries in regions.items():
    region_emissions[region] = df[df['Area'].isin(countries)].iloc[:, 4:].sum(axis=0)
plt.figure(figsize=(12, 6))
for region, emissions in region_emissions.items():
    plt.plot(emissions.index, emissions.values, label=region)
plt.title('Total Carbon Emissions by Region')
plt.xlabel('Year')
plt.ylabel('Total Carbon Emissions')
plt.legend()
plt.grid(True)
plt.show()

This visualization presents the total carbon emissions trend for various global regions over time. The horizontal axis delineates the years, while the vertical axis quantifies the total carbon emissions in metric tons.

### #3 Total Carbon Emissions in 2020


In [None]:
plt.figure(figsize=(10, 6))
plt.hist(df[2020], bins=20)
plt.title(f'Distribution of Total Carbon Emissions in 2020')
plt.xlabel('Total Carbon Emissions')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

The plot depicts a histogram with the horizontal axis representing the total carbon emissions, and the vertical axis indicating the frequency of occurrences of carbon emission values within specific bins.

### #4 Distribution of Total Carbon Emissions in 2009 vs 2020 for Syria

This code generates two histograms to illustrate the distribution of total carbon emissions in Syria for the years 2009 and 2020.


In [None]:
syria_data = df[df['Area'] == 'Syrian Arab Republic'][[2009, 2020]]
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.hist(syria_data[2009], bins=20, color='skyblue')
plt.title('Distribution of Total Carbon Emissions in 2009 (Syria)')
plt.xlabel('Total Carbon Emissions')
plt.ylabel('Frequency')

plt.subplot(1, 2, 2)
plt.hist(syria_data[2020], bins=20, color='lightgreen')
plt.title('Distribution of Total Carbon Emissions in 2020 (Syria)')
plt.xlabel('Total Carbon Emissions')
plt.ylabel('Frequency')
plt.grid(True)

plt.tight_layout()  # no overlap
plt.show()


In [None]:
file_path = pd.ExcelFile("compiled_carbon_dataTM.xlsx")
df = pd.read_excel(file_path, sheet_name='Total Emissions Per Country')
display(df)

Below code snippet performs the follwoing steps:
1. Initializes a dictionary to store top 3 countries per year
2. Calculates the sum of emissions for each country for the year and get the top 3
3. Converts the dictionary to a DataFrame for easier manipulation
4. Transposes the DataFrame for plotting
5. Plots the results

In [None]:
top_3_per_year = {}
years = list(range(2000, 2021))

for year in years: 
    top_3_per_year[year] = df.groupby('Area')[year].sum().nlargest(3)
 
top_3_per_year_df = pd.DataFrame(top_3_per_year)
transposed_df = top_3_per_year_df.transpose()

plt.figure(figsize=(15, 8))
for country in transposed_df.columns:
    plt.plot(years, transposed_df[country], label=country, marker='o')
plt.title('Top 3 Countries by Total Emissions Per Year')
plt.xlabel('Year')
plt.ylabel('Total Emissions (kilotonnes CO2 equivalent)')
plt.legend(title='Country')
plt.grid(True)
plt.xticks(years, rotation=45)
plt.tight_layout()
plt.show()

Now we want to find top three contributing items to emissions for each of the above four countries, along with their total quantities summed across all years by the following steps:
1. Ensure that all year columns are identified correctly, assuming year columns are numeric but need to be checked as string
2. Group by 'Area' and 'Item', then sum only the year columns
3. Define the top countries
4. Initialize a dictionary to store the top 3 items per country
5. Sum emissions across all years for each item within the specific country and get the top 3 items with the highest emissions, storing the results in a dictionary
6. Display the results

In [None]:
year_columns = [col for col in df.columns if isinstance(col, int) or col.isdigit()]

item_emissions = df.groupby(['Area', 'Item'])[year_columns].sum()
top_countries = ["China", "United States of America", "India", "Brazil"]
top_items_per_country = {}

for country in top_countries:
    total_emissions = item_emissions.loc[country].sum(axis=1)
    top_3_items = total_emissions.nlargest(3)
    top_items_per_country[country] = top_3_items

for country, items in top_items_per_country.items():
    print(f"{country}:")
    for item, total in items.items():
        print(f"{item}: {total / 1e6:.5f} million kilotonnes CO2 equivalent")
    print()

Now, when we have identified the top contrubutor countries, we prepared 2x2 pie charts representing the top three emission items for  them (China, the United States of America, India, and Brazil). 
Each pie chart displays the proportion of total emissions contributed by the top three items for each country.
Steps are:
1. Sample data prepared from the previous discussion assuming we have the top 3 items per country with their emissions
2. Create a 2x2 grid of pie charts Flattening it into a 1D array for easier indexing
3. Iterate over each country and its top 3 items, creating a pie chart for each
4. Adjust layout to prevent overlap and display it

In [None]:
top_items_per_country = {
    'China': {'Energy': 348288500, 'Agri-food systems': 70638320, 'IPPU': 50614080},
    'United States of America': {'Energy': 240097800, 'Agri-food systems': 47245000, 'IPCC Agriculture': 40559400},
    'India': {'Energy': 76514250, 'Agri-food systems': 51047480, 'Emissions on agricultural land': 32394680},
    'Brazil': {'Agri-food systems': 77177970, 'Emissions on agricultural land': 71641150, 'AFOLU': 52021460}
}

fig, axs = plt.subplots(2, 2, figsize=(12, 8))
axs = axs.flatten()

for idx, (country, items) in enumerate(top_items_per_country.items()):
    sizes = items.values()
    labels = items.keys()
    axs[idx].pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
    axs[idx].set_title(f'Top 3 Emission Items for {country}')

plt.tight_layout()
plt.show()

Now we have done some visualisation a another part of our dataset, that represents climate change statistics per country from 1961 to 2022 years.

In [None]:
# filling missing values
data = pd.read_excel(file_path, sheet_name='climate_change_indicators')
yearly_columns = data.columns[data.columns.get_loc('F1961'):data.columns.get_loc('F2022')+1]
data[yearly_columns] = data[yearly_columns].interpolate(method='linear', axis=0)
data

From this particular dataset we want to extratc top 5 countries that had the highest average change degree for all the periods, by the following steps:
1. Calculate the mean temperature change for each country over the years
2. Sort the countries based on the average temperature change in descending order
3. Display the top 5 countries with the highest average temperature change

In [None]:
data['Average_Temperature_Change'] = data[yearly_columns].mean(axis=1)

sorted_data = data.sort_values(by='Average_Temperature_Change', ascending=False)

top_5_countries = sorted_data[['Country', 'Average_Temperature_Change']].head(5)
top_5_countries

A heatmap
1. Create a DataFrame suitable for the heatmap
2. Transpose the DataFrame to have years as rows and countries as columns

In [None]:
top_5_data = data[data['Country'].isin(top_5_countries['Country'])]
box_plot_data = [top_5_data[top_5_data['Country'] == country][yearly_columns].values.flatten() for country in top_5_countries['Country']]

heatmap_data = pd.DataFrame()
for country in top_5_countries['Country']:
    heatmap_data[country] = top_5_data[top_5_data['Country'] == country][yearly_columns].iloc[0].values

heatmap_data = heatmap_data.T
heatmap_data.columns = np.arange(1961, 2023)

plt.figure(figsize=(20, 10))
sns.heatmap(heatmap_data, annot=False, cmap='coolwarm', linewidths=.5)
plt.title('Heatmap of Yearly Temperature Changes for Top 5 Countries')
plt.xlabel('Year')
plt.ylabel('Country')
plt.show()

# In conclusion