# Name - Vasu Bansal
# Roll No. - 045055

# Project : Temperature Dataset Analysis

**Synopsis of the DataSet**

1. This dataset provides a comprehensive record of global temperature measurements over a period of 200 years.

2. Total Entries: The dataset encompasses temperature records with a total of 239177 observations, providing a comprehensive view of global temperature patterns.

3. Geographical Distribution: Temperature measurements are available for cities across different countries, showcasing the global nature of the dataset.

4. Temporal Span: The dataset covers a significant temporal span, allowing for the analysis of long-term temperature trends and patterns.

5. Temperature Metrics: Key temperature metrics include 'AverageTemperature' and 'Average Temperature Uncertainty,' providing insights into both the central tendency and the level of uncertainty associated with temperature measurements.

6. Geographic Coordinates: The dataset includes latitude and longitude information, allowing for precise location-based analysis.

7. Industrial Revolution Analysis: The dataset offers an opportunity to explore temperature trends in regions where the Industrial Revolution first started and compare them with regions where industrialization occurred later.

**Objectives:**

1. Analyze Global Temperature Trends: Evaluate changes in average global temperatures over the recorded years. Identify patterns or anomalies in global temperature variations.

2. Country-Specific Investigations: Examine temperature records for specific countries to understand regional climate patterns. Identify countries with significant temperature fluctuations or trends.

3. Industrial Revolution Impact:Assess the correlation between industrialization periods and temperature changes in different regions.Analyze how early industrialization impacted temperature compared to regions that industrialized later.

4. Latitude and Longitude Effects: Investigate temperature variations based on geographical coordinates (latitude and longitude). Identify regions with notable temperature differences due to their geographic locations.

5. Uncertainty Analysis: Assess the level of uncertainty in temperature measurements and identify regions with higher variability in recorded temperatures.

This dataset provides valuable insights into global temperature patterns and allows for the examination of factors such as industrialization, geographical location, and long-term climate trends.

In [None]:
import pandas as pd
data = pd.read_csv('GTD.csv')
data.head()

In [None]:
data.shape

In [None]:
data.info()

Descriptive Statistics

In [None]:
data.describe()

# Exploratory Data Analysis (EDA)

**Due to the vastness of the dataset it is not possible to concisely perform exploratory data analysis on all countries in the dataset. Rather, we select 10 countries from the dataset, both developed and developing countries, and analyze the temperature trends.
Selected Countries-**
1. India
2. United States of America
3. United Kingdoms
4. South Korea
5. Kenya
6. Japan
7. Australia
8. Germany
9. Brazil 
10. Ukraine


**1. Analyzing Temperature Trends**

In [None]:
#Plot for Temperature Trend of the countries
import pandas as pd
import plotly.express as px

# Drop NA values
data = data.dropna()

selected_country = 'India'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'United States'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'South Korea'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Kenya'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Japan'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Australia'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Germany'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Brazil'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()

selected_country = 'Ukraine'

# Filter data for the selected country
country_data = data[data['Country'] == selected_country]

# Convert the 'date' column to datetime format
country_data['date'] = pd.to_datetime(country_data['date'])

# Extract the 'year' from the date
country_data['year'] = country_data['date'].dt.year

# Group the data by year and calculate the mean temperature
yearly_temperature = country_data.groupby('year')['AverageTemperature'].mean().reset_index()

# Create the Animated Line Plot
line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                    labels={'AverageTemperature': 'Average Temperature (in °C)'},
                    title=f'Temperature Trend Over the Years in {selected_country}')
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()


**Observations-**

1. India: In 1797, the average temperature was 26.7°C, followed by a slight decrease. In 1862, the average temperature dropped to 22.6°C. Notable increases occurred in 1867, reaching 25.6°C. A significant rise to 27°C was observed in 2009.
Possible influences: Agricultural practices, industrialization, and global climate change.


2. United States of America: In 1804, the average temperature was 10.04°C, witnessing a constant increase over the years. In 2012, the average temperature was 13.9°C.
Possible influences: Industrialization, urbanization, and climate change.


3. United Kingdom: In 1840, the average temperature was 9.4°C, consistently increasing in 2007, the average temperature reached 12.2°C.
Possible influences: Industrial revolution, urban development, and climate change.


4. South Korea: In 1850, the average temperature was 15.5°C, with fluctuations. A significant drop to 15.2°C was observed in 1906, followed by a constant increase to 17.2°C in 2005.
Possible influences: Industrialization, economic growth, and regional climate patterns.


5. Kenya: In 1850, the average temperature was 15.3°C, with fluctuations. A significant drop to 15.2°C occurred in 1906, followed by a constant increase to 17.2°C in 2005.
Possible influences: Climate variability, deforestation, and agricultural practices.


6. Japan: In 1850, the average temperature was 12.8°C, with fluctuations. A notable drop to 12.3°C occurred in 1894, followed by a constant increase to 15.18°C in 2004.
Possible influences: Industrialization, technological advancement, and global climate patterns.


7. Australia: In 1850, the average temperature was 14.1°C, with fluctuations. A notable decrease to 14.48°C occurred in 1949, followed by an increase to 16.5°C in 2007.
Possible influences: Droughts, heatwaves, and climate change.


8. Germany: In 1850, the average temperature was 7.55°C. A notable increase was observed in 1934, reaching 10.69°C. Temperature has mostly remained constant with few ups and downs.
Possible influences: Industrialization, World Wars, and post-war reconstruction.


9. Brazil: In 1850, the average temperature was 26.3°C, with fluctuations. A notable decrease to 21.42°C occurred in 1865, followed by a constant increase to 24.26°C in 2004.
Possible influences: Deforestation, urbanization, and global climate change.


10. Ukraine: In 1850, the average temperature was 6.4°C. A notable increase was observed in 1939, reaching 8.187°C. The temperature increased to 10.33°C in 2013.
Possible influences: Soviet industrialization, agricultural practices, and climate change.


These observations highlight temperature trends in each country and suggest possible factors such as industrialization, climate change, and regional events that might have contributed to these trends.

**2. Country-Specific Investigations + 3. Industrial Revolution Impact**

In [None]:
import pandas as pd
import plotly.express as px

# Drop NA values
data = data.dropna()

# List of selected countries
selected_countries = ['India', 'United States of America', 'United Kingdoms', 'South Korea', 'Kenya',
                       'Japan', 'Australia', 'Germany', 'Brazil', 'Ukraine']

# Create a new DataFrame for selected countries
selected_countries_data = data[data['Country'].isin(selected_countries)]

# Convert the 'date' column to datetime format
selected_countries_data['date'] = pd.to_datetime(selected_countries_data['date'])

# Extract the 'year' from the date
selected_countries_data['year'] = selected_countries_data['date'].dt.year

# Group the data by country, year, and calculate the mean temperature
temperature_by_country_year = selected_countries_data.groupby(['Country', 'year'])['AverageTemperature'].mean().reset_index()

# Create an animated line plot for temperature trends by country
line_plot = px.line(temperature_by_country_year, x='year', y='AverageTemperature', color='Country',
                    labels={'AverageTemperature': 'Average Temperature (in °C)', 'year': 'Year'},
                    title='Temperature Trends in Selected Countries Over the Years',
                    line_group='Country', animation_group='Country')

# Update layout for better visualization
line_plot.update_layout(transition_duration=500)

# Display the plot
line_plot.show()


In [None]:
import pandas as pd
import plotly.express as px

# Drop NA values
data = data.dropna()

# List of given countries
given_countries = ['India', 'United States of America', 'United Kingdoms', 'South Korea', 'Kenya',
                    'Japan', 'Australia', 'Germany', 'Brazil', 'Ukraine']

# Create an empty list to store individual line plots for each city in each country
line_plots = []

# Loop through each given country
for country in given_countries:
    # Filter data for the current country
    country_data = data[data['Country'] == country]

    # Get unique cities for the current country
    cities_in_country = country_data['City'].unique()

    # Loop through each city in the current country
    for city in cities_in_country:
        # Filter data for the current city in the current country
        city_data = country_data[country_data['City'] == city]

        # Convert the 'date' column to datetime format
        city_data['date'] = pd.to_datetime(city_data['date'])

        # Extract the 'year' from the date
        city_data['year'] = city_data['date'].dt.year

        # Group the data by year and calculate the mean temperature
        yearly_temperature = city_data.groupby(['City', 'year'])['AverageTemperature'].mean().reset_index()

        # Create an animated line plot for temperature trends by city
        line_plot = px.line(yearly_temperature, x='year', y='AverageTemperature',
                            labels={'AverageTemperature': 'Average Temperature (in °C)', 'year': 'Year'},
                            title=f'Temperature Trend Over the Years in {city}, {country}',
                            line_group='City', animation_group='City')

        # Add the line plot to the list
        line_plots.append(line_plot)

# Display all the line plots
for plot in line_plots:
    plot.update_layout(transition_duration=500)
    plot.show()


**Observations-**

Gives the citywise trends for different cities in different countries over the given span of time

The 1st phase of Industrial Revolution started in 1760's and lasted to 1840's. It reached USA in 1790's and late 1850's to Asian, African, and Australian countries. As is clear from the graphs all the given countries saw rise in average temperature over the same time period and then a decline as awareness increased but has since been constantly rising due to release of greenhouse gases and global warming.

**4. Latitude and Longitude Effects**

In [None]:
import pandas as pd
import plotly.express as px

# Drop NA values
data = data.dropna()

# Scatter plot for latitude and longitude with color-coded average temperature
scatter_plot = px.scatter(data, x='Longitude', y='Latitude', color='AverageTemperature',
                          labels={'AverageTemperature': 'Average Temperature (in °C)', 'Longitude': 'Longitude', 'Latitude': 'Latitude'},
                          title='Temperature Variations Based on Geographical Coordinates',
                          size_max=20, opacity=0.7)

# Display the scatter plot
scatter_plot.show()

# Numerical analysis to find the year with the highest average temperature for each country
max_temp_years = data.groupby(['Country', 'date'])['AverageTemperature'].mean().reset_index()
max_temp_years = max_temp_years.loc[max_temp_years.groupby('Country')['AverageTemperature'].idxmax()].reset_index(drop=True)
b
max_temp_years.index = max_temp_years.index + 1

print("Date when each country reached their highest average temperature:")
print(max_temp_years[['Country', 'date', 'AverageTemperature']])


**Observation-**
 It can be clearly seen that as we move towards north-west the average temperature rises.

**5. Uncertainty Analysis**

In [None]:
import pandas as pd
import plotly.express as px

# Drop NA values
data = data.dropna()

# Box plot for uncertainty in temperature measurements
box_plot = px.box(data, x='Country', y='AverageTemperatureUncertainty',
                  labels={'AverageTemperatureUncertainty': 'Temperature Uncertainty', 'Country': 'Country'},
                  title='Uncertainty in Temperature Measurements Across Countries')

# Display the box plot
box_plot.show()


In [None]:
# Summary statistics for temperature uncertainty
uncertainty_stats = data['AverageTemperatureUncertainty'].describe()

print("Summary Statistics for Temperature Uncertainty:")
print(uncertainty_stats)

**Observations -**

1. Count: The dataset consists of 228,175 temperature records with associated uncertainty values.

2. Mean: The average temperature uncertainty across all records is approximately 0.97 units. This gives a general sense of the typical level of uncertainty in the temperature measurements.

3. Standard Deviation (Std): The standard deviation of around 0.98 indicates the degree of variability or dispersion in the temperature uncertainty values. A higher standard deviation suggests greater variability.

4. Minimum Value: The minimum uncertainty value is 0.04, which represents the lowest level of uncertainty recorded in the dataset. This might indicate instances where temperature measurements are relatively more certain.

5. 25th Percentile (Q1): The first quartile (25th percentile) has a value of 0.34. This suggests that 25% of the uncertainty values fall below this threshold, reflecting a lower range of uncertainties for a significant portion of the dataset.

6. Median (50th Percentile): The median uncertainty value is 0.592, indicating the middle point of the dataset. Half of the uncertainty values are below this value, and half are above.

7. 75th Percentile (Q3): The third quartile (75th percentile) has a value of 1.32. This implies that 75% of the uncertainty values fall below this threshold, representing a higher range of uncertainties for a smaller portion of the dataset.

8. Maximum Value: The maximum uncertainty value recorded is 14.037, signifying the highest level of uncertainty observed in the dataset. This could be an outlier or an extreme case where temperature measurements are notably uncertain.

**Racing Bar Chart for the Countries over the years**

In [None]:
import pandas as pd
import bar_chart_race as bcr

# Drop NA values
data = data.dropna()

# Select any 10 countries for the racing bar chart
selected_countries = ['India', 'United States', 'United Kingdom', 'South Korea', 'Kenya',
                       'Japan', 'Australia', 'Germany', 'Brazil', 'Ukraine']

# Filter data for selected countries
filtered_data = data[data['Country'].isin(selected_countries)]

# Convert the 'date' column to datetime format
filtered_data['date'] = pd.to_datetime(filtered_data['date'])

# Extract the 'year' from the date
filtered_data['year'] = filtered_data['date'].dt.year

# Sum up the average temperatures by year and country
total_temperature = filtered_data.groupby(['year', 'Country']).agg({'AverageTemperature': 'mean'}).reset_index()

# Pivot the data for bar chart race
pivot_temperature = total_temperature.pivot_table(index='year', columns='Country', values='AverageTemperature')

# Creating the bar chart race
bcr.bar_chart_race(
    df=pivot_temperature,
    figsize=(10, 6),
    title='Change in Average Temperature by Year',
    cmap='dark12',  # Choose a colormap (optional)
    n_bars=10,  # Set the number of bars to display
    period_length=500,  # Set the length of each period in milliseconds
    steps_per_period=10,  # Set the number of steps (frames) for each period
    bar_kwargs={'alpha': 0.8},  # Set transparency for bars
    shared_fontdict={'family' : 'Arial', 'weight' : 'bold', 'size' : 12},
    scale='linear',  # Use linear scaling for the axis
    title_size=18,
    tick_label_size=10,  # Set the size of tick labels
    bar_label_size=10,  # Set the size of bar labels
    filter_column_colors=True,  # Apply consistent colors to countries
    period_summary_func=lambda v, r: {'x': 0.99, 'y': 0.04, 
                                      's': f'Global Average Temperature: {v.mean():,.2f}°C',
                                      'ha': 'right', 'size': 10},
    perpendicular_bar_func='mean',  # Set the perpendicular bar to represent the mean temperature
    bar_size=.95,  # Set the width of the bars
    period_label={'x': .99, 'y': .25, 'ha': 'right', 'fontsize': 14},
)

# Managerial Implications-

**1. Analyzing Temperature Trends:**

The temperature trends in each country highlight significant historical patterns and fluctuations. Managers should consider these trends while planning for infrastructure, agriculture, and public health initiatives.
Understanding the possible influences, such as industrialization, urbanization, and climate change, allows policymakers to make informed decisions about sustainable development and environmental policies.

**2. Country-Specific Investigations and Industrial Revolution Impact:**

Country-specific investigations provide insights into local climate patterns, influencing factors, and potential areas of concern. Managers can use this information for disaster preparedness, resource allocation, and infrastructure planning.
The alignment of temperature trends with the phases of the Industrial Revolution emphasizes the long-term impact of industrialization on climate. Policymakers should focus on sustainable practices and emission reduction strategies.

**3. Latitude and Longitude Effects:**

The analysis of latitude and longitude effects helps identify regions that experience varying temperature patterns. Managers can use this information for climate adaptation strategies, especially in areas prone to extreme temperatures.
Recognizing the date when each country reached its highest average temperature provides valuable insights for climate change adaptation and mitigation efforts. Early identification allows for proactive measures to address temperature-related challenges.

**4. Uncertainty Analysis:**

The uncertainty analysis provides crucial information about the reliability of temperature measurements. Managers should be aware of the level of uncertainty associated with temperature data when making decisions based on climate information.
Higher uncertainty values may indicate areas where additional data collection or advanced monitoring technologies are needed. Managers should invest in improving data accuracy and reliability for better decision-making.

**Overall Managerial Considerations:**

Considering the impact of historical events, such as industrialization and global climate change, managers should adopt sustainable practices and policies.
Climate variability and uncertainties in temperature measurements underscore the importance of flexible and adaptive strategies in various sectors like agriculture, infrastructure, and public health.

These observations empower managers and policymakers to make informed decisions considering historical temperature trends, geographical influences, and the reliability of temperature data. Sustainable development practices and climate-resilient policies can contribute to long-term environmental and economic stability.