**Uploading dataset in google colab**

In [None]:
from google.colab import files
uploaded = files.upload()

Saving 3- co-emissions-per-capita.csv to 3- co-emissions-per-capita.csv


**Importing required python libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import io

**Showing Dataset**

In [None]:
data = pd.read_csv(io.BytesIO(uploaded['3- co-emissions-per-capita.csv']))
print(data)

            Entity Code  Year  Annual CO₂ emissions (per capita)
0      Afghanistan  AFG  1949                           0.001992
1      Afghanistan  AFG  1950                           0.011266
2      Afghanistan  AFG  1951                           0.012098
3      Afghanistan  AFG  1952                           0.011946
4      Afghanistan  AFG  1953                           0.013685
...            ...  ...   ...                                ...
26595     Zimbabwe  ZWE  2018                           0.711830
26596     Zimbabwe  ZWE  2019                           0.636645
26597     Zimbabwe  ZWE  2020                           0.500945
26598     Zimbabwe  ZWE  2021                           0.524972
26599     Zimbabwe  ZWE  2022                           0.542628

[26600 rows x 4 columns]


**head()**

head function is used to show the starting entries of dataset. It by defsult shows the first 5 entries of the datset. If we want some specific number of entries then we can mention the number in function. For example

data.head(10)

In [None]:
data.head()

Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita)
0,Afghanistan,AFG,1949,0.001992
1,Afghanistan,AFG,1950,0.011266
2,Afghanistan,AFG,1951,0.012098
3,Afghanistan,AFG,1952,0.011946
4,Afghanistan,AFG,1953,0.013685


**describe()**

describe function is used to know the description of the dataset. This function gives us a total count, mean, standard deviation, min, max and inbetween ranges of all the columns of the dataset.

In [None]:
data.describe()

Unnamed: 0,Year,Annual CO₂ emissions (per capita)
count,26600.0,26600.0
mean,1949.09688,3.711042
std,56.387496,14.295633
min,1750.0,0.0
25%,1915.0,0.132211
50%,1963.0,0.933317
75%,1994.0,4.150357
max,2022.0,771.8865


**Checking Missing Values**

isnull() is used to check if there are any missing values in dataset.

In [None]:
data.isnull()

Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita)
0,False,False,False,False
1,False,False,False,False
2,False,False,False,False
3,False,False,False,False
4,False,False,False,False
...,...,...,...,...
26595,False,False,False,False
26596,False,False,False,False
26597,False,False,False,False
26598,False,False,False,False


**Replacing Missing Values in dataset using mean()**

instance['required column name'] = instance['required column name'].fillna(instance['required column name'].mean())

instance = data

required column name = name of the column whose missing value we want to replace with some value

fillna() = function used to fill the null values of dataset

mean() = method used to fill the null values of the dataset. It fills the null value with the mean or average of the column.

In [None]:
data['Annual CO₂ emissions (per capita)'] = data['Annual CO₂ emissions (per capita)'].fillna(data['Annual CO₂ emissions (per capita)'].mean())
data

Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita)
0,Afghanistan,AFG,1949,0.001992
1,Afghanistan,AFG,1950,0.011266
2,Afghanistan,AFG,1951,0.012098
3,Afghanistan,AFG,1952,0.011946
4,Afghanistan,AFG,1953,0.013685
...,...,...,...,...
26595,Zimbabwe,ZWE,2018,0.711830
26596,Zimbabwe,ZWE,2019,0.636645
26597,Zimbabwe,ZWE,2020,0.500945
26598,Zimbabwe,ZWE,2021,0.524972


**Reshaping the dataset**

In reshaping of dataset we change some values from a specific column to some numbers or characters for our convinient use while analysing data

map() = It is used for mapping old variable and new variable. For eg. code for Afghanistan is AFG and I want to change it to 0 then I will map AFG as 0 using map function. Default datatype of map is float if we are changing it to some numeric variable.

In [None]:
data['Code'] = data['Code'].map({'AFG':0,'ZWE':1})
data

Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita)
0,Afghanistan,0.0,1949,0.001992
1,Afghanistan,0.0,1950,0.011266
2,Afghanistan,0.0,1951,0.012098
3,Afghanistan,0.0,1952,0.011946
4,Afghanistan,0.0,1953,0.013685
...,...,...,...,...
26595,Zimbabwe,1.0,2018,0.711830
26596,Zimbabwe,1.0,2019,0.636645
26597,Zimbabwe,1.0,2020,0.500945
26598,Zimbabwe,1.0,2021,0.524972


**Filtering of Data**

In filtering of data we filter some specific data on which we want to work. We can filter data with the help of some operators.

In [None]:
data = data[data['Year'] == 2020]
data

Unnamed: 0,Entity,Code,Year,Annual CO₂ emissions (per capita)
71,Afghanistan,0.0,2020,0.305039
299,Africa,,2020,0.998217
389,Albania,,2020,1.750674
496,Algeria,,2020,3.909925
724,Andorra,,2020,4.808461
...,...,...,...,...
26103,Wallis and Futuna,,2020,2.196079
26331,World,,2020,4.464730
26404,Yemen,,2020,0.337132
26477,Zambia,,2020,0.430287


**Deletion of Column or Row**

drop() = drop function is used  to delete the column or row from a dataset

axis = axis is used to define the row or column

if axis = 1 then we want to delete a column

if axis = 0 then we want to delete a row

In [None]:
data = data.drop(['Entity'],axis=1)
data

Unnamed: 0,Code,Year,Annual CO₂ emissions (per capita)
71,0.0,2020,0.305039
299,,2020,0.998217
389,,2020,1.750674
496,,2020,3.909925
724,,2020,4.808461
...,...,...,...
26103,,2020,2.196079
26331,,2020,4.464730
26404,,2020,0.337132
26477,,2020,0.430287


**Removing Duplicates**

we can filter non-duplicate values using duplicated() and ~

~ = not

duplicated() = finds duplicate value in specified column

In [None]:
non_duplicate = data[~data.duplicated('Annual CO₂ emissions (per capita)')]
non_duplicate

Unnamed: 0,Code,Year,Annual CO₂ emissions (per capita)
71,0.0,2020,0.305039
299,,2020,0.998217
389,,2020,1.750674
496,,2020,3.909925
724,,2020,4.808461
...,...,...,...
26103,,2020,2.196079
26331,,2020,4.464730
26404,,2020,0.337132
26477,,2020,0.430287
