### Importing necesarry libraries and dropping NaN values

In [7]:
import pandas as pd
import numpy as np
consumption_df = pd.read_csv('Country_Consumption_TWH (1).csv')
consumption_df = consumption_df[:31]  #dropping NaN 

In [8]:
consumption_df.head()

Unnamed: 0,Year,China,United States,Brazil,Belgium,Czechia,France,Germany,Italy,Netherlands,...,Australia,New Zealand,Algeria,Egypt,Nigeria,South Africa,Iran,Kuwait,Saudi Arabia,United Arab Emirates
0,1990.0,874.0,1910.0,141.0,48.0,50.0,225.0,351.0,147.0,67.0,...,86.0,14.0,22.0,33.0,66.0,90.0,69.0,9.0,58.0,20.0
1,1991.0,848.0,1925.0,143.0,50.0,45.0,237.0,344.0,150.0,69.0,...,85.0,14.0,23.0,33.0,70.0,92.0,77.0,3.0,68.0,23.0
2,1992.0,877.0,1964.0,145.0,51.0,44.0,234.0,338.0,149.0,69.0,...,87.0,14.0,24.0,34.0,72.0,88.0,81.0,9.0,77.0,22.0
3,1993.0,929.0,1998.0,148.0,49.0,43.0,238.0,335.0,149.0,70.0,...,91.0,15.0,24.0,35.0,74.0,94.0,87.0,12.0,80.0,23.0
4,1994.0,973.0,2036.0,156.0,52.0,41.0,231.0,333.0,147.0,70.0,...,91.0,15.0,23.0,34.0,72.0,98.0,97.0,14.0,84.0,26.0


### Making Year the index of the dataframe

In [9]:
consumption_df.set_index("Year", drop=True, append=False, inplace=True, verify_integrity=False)

In [10]:
consumption_df.head()

Unnamed: 0_level_0,China,United States,Brazil,Belgium,Czechia,France,Germany,Italy,Netherlands,Poland,...,Australia,New Zealand,Algeria,Egypt,Nigeria,South Africa,Iran,Kuwait,Saudi Arabia,United Arab Emirates
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1990.0,874.0,1910.0,141.0,48.0,50.0,225.0,351.0,147.0,67.0,103.0,...,86.0,14.0,22.0,33.0,66.0,90.0,69.0,9.0,58.0,20.0
1991.0,848.0,1925.0,143.0,50.0,45.0,237.0,344.0,150.0,69.0,101.0,...,85.0,14.0,23.0,33.0,70.0,92.0,77.0,3.0,68.0,23.0
1992.0,877.0,1964.0,145.0,51.0,44.0,234.0,338.0,149.0,69.0,99.0,...,87.0,14.0,24.0,34.0,72.0,88.0,81.0,9.0,77.0,22.0
1993.0,929.0,1998.0,148.0,49.0,43.0,238.0,335.0,149.0,70.0,101.0,...,91.0,15.0,24.0,35.0,74.0,94.0,87.0,12.0,80.0,23.0
1994.0,973.0,2036.0,156.0,52.0,41.0,231.0,333.0,147.0,70.0,96.0,...,91.0,15.0,23.0,34.0,72.0,98.0,97.0,14.0,84.0,26.0


### Calcuating the average consumption country-wise to prepare a dataframe for isolation forest model 

In [11]:
sum = 0
average = []
for i in consumption_df:
    for j in consumption_df[i]:
        sum+=j
    average.append(sum/31)
    sum=0

In [16]:
consumption_df = consumption_df.swapaxes("rows", "columns")

In [18]:
consumption_df["Average"] = average

In [20]:
consumption_df = consumption_df.swapaxes("rows", "columns")

---

## Anomaly Detection in Energy Consumption

The notebook cells apply an Isolation Forest algorithm to detect anomalies in energy consumption data. This unsupervised machine learning model is particularly effective for identifying outliers within high-dimensional datasets.

### Transformations and Model Application

- **Dataframe Transpose**: The dataframe `consumption_df` is transposed to align the data for further processing.
- **Average Consumption**: A new column 'Average' is added, containing the average consumption values.
- **Anomaly Detection**: An Isolation Forest model is fitted on the 'Average' consumption data, providing anomaly scores and predictions for each data point.

This step is critical in identifying data points that significantly differ from the dataset's overall patterns, which could indicate data entry errors, exceptional events, or the need for further investigation.

---


In [40]:
from sklearn.ensemble import IsolationForest

average_df = pd.DataFrame(consumption_df.loc['Average'])
model = IsolationForest(n_estimators=100, max_samples='auto', max_features=1.0)
model.fit(average_df)
average_df['scores']=model.decision_function(average_df)
average_df['anomaly']=model.predict(average_df[['Average']])

In [42]:
anomaly = average_df.loc[average_df['anomaly']==-1]
anomaly_index = list(anomaly.index)
anomaly # Print outliers

Unnamed: 0,Average,scores,anomaly
China,1923.322581,-0.278363,-1
United States,2167.451613,-0.296222,-1
Germany,327.903226,-0.09247,-1
Russia,691.677419,-0.183531,-1
Canada,259.516129,-0.000716,-1
Japan,476.741935,-0.1288,-1
India,580.0,-0.13006,-1
New Zealand,17.612903,-0.001303,-1


### Results and Outlier Identification

- **Anomaly Scores**: Each country's average energy consumption is assigned an anomaly score indicating its degree of deviation from the rest of the data.
- **Outlier Flagging**: Data points with an anomaly score corresponding to an outlier are flagged with a '-1' in the 'anomaly' column.
- **Outliers Output**: The countries identified as outliers based on their average energy consumption are displayed. 


Here, we see that New Zealand is an anomaly because of its unusually low consumption. However, for the purposes of this project, we need to find those countries that consume energy the most (not the least). Thus, we remove New Zealand from the DataFrame of outliers in the next cell.

In [43]:
anomaly = anomaly.swapaxes("rows", "columns")
del anomaly["New Zealand"]
anomaly = anomaly.swapaxes("rows", "columns")
anomaly # DataFrame of countries that consumed the most energy (renewable + non-renewable) on average from 1990 to 2020.

Unnamed: 0,Average,scores,anomaly
China,1923.322581,-0.278363,-1.0
United States,2167.451613,-0.296222,-1.0
Germany,327.903226,-0.09247,-1.0
Russia,691.677419,-0.183531,-1.0
Canada,259.516129,-0.000716,-1.0
Japan,476.741935,-0.1288,-1.0
India,580.0,-0.13006,-1.0


## CONCLUSION

**China , United States , Germany , Russia , Canada , Japan and India consume the most energy (in Terawatt hours) and are an anamoly compared to others. Therefore these countries should focus greatly on production of renewable energy so that renewable energy production share(%) in specifically these countries increase by 2047 for a sustainable future**