# Crop Yield Analysis

Crop yield prediction is an important agricultural problem. The Agricultural yield primarily depends on weather conditions (rain, temperature, etc), pesticides and accurate information about history of crop yield is an important thing for making decisions related to agricultural risk management and future predictions. The basic ingredients that sustain humans are similar. We eat a lot of corn, wheat, rice and other simple crops. In this project the prediction of top 10 most consumed yields all over the world is established by applying machine learning techniques. It consist of 10 most consumed crops.

These corps include :

1. Cassava
2. Maize
3. Plantains and others
4. Potatoes
5. Rice, paddy
7. Sorghum
8. Soybeans
9. Sweet potatoes
10. Wheat
11. Yams

In [1]:
# importing necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
# reading the datasets

df1 = pd.read_csv('pesticides.csv')
df2 = pd.read_csv('rainfall.csv')
df3 = pd.read_csv('temp.csv')
df4 = pd.read_csv('yield.csv')

In [3]:
# merging the datasets

data = pd.concat([df1, df2, df3,df4])

# Data Preprocessing

# Pesticides Data:

In [4]:
df1

Unnamed: 0,Domain,Area,Element,Item,Year,Unit,Value
0,Pesticides Use,Albania,Use,Pesticides (total),1990,tonnes of active ingredients,121.00
1,Pesticides Use,Albania,Use,Pesticides (total),1991,tonnes of active ingredients,121.00
2,Pesticides Use,Albania,Use,Pesticides (total),1992,tonnes of active ingredients,121.00
3,Pesticides Use,Albania,Use,Pesticides (total),1993,tonnes of active ingredients,121.00
4,Pesticides Use,Albania,Use,Pesticides (total),1994,tonnes of active ingredients,201.00
...,...,...,...,...,...,...,...
4344,Pesticides Use,Zimbabwe,Use,Pesticides (total),2012,tonnes of active ingredients,3375.53
4345,Pesticides Use,Zimbabwe,Use,Pesticides (total),2013,tonnes of active ingredients,2550.07
4346,Pesticides Use,Zimbabwe,Use,Pesticides (total),2014,tonnes of active ingredients,2185.07
4347,Pesticides Use,Zimbabwe,Use,Pesticides (total),2015,tonnes of active ingredients,2185.07


In [5]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4349 entries, 0 to 4348
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Domain   4349 non-null   object 
 1   Area     4349 non-null   object 
 2   Element  4349 non-null   object 
 3   Item     4349 non-null   object 
 4   Year     4349 non-null   int64  
 5   Unit     4349 non-null   object 
 6   Value    4349 non-null   float64
dtypes: float64(1), int64(1), object(5)
memory usage: 238.0+ KB


In [6]:
df1.describe()

Unnamed: 0,Year,Value
count,4349.0,4349.0
mean,2003.138883,20303.34
std,7.728044,117736.2
min,1990.0,0.0
25%,1996.0,93.0
50%,2003.0,1137.56
75%,2010.0,7869.0
max,2016.0,1807000.0


# Rainfall Data:

In [7]:
df2

Unnamed: 0,Area,Year,average_rain_fall_mm_per_year
0,Afghanistan,1985,327
1,Afghanistan,1986,327
2,Afghanistan,1987,327
3,Afghanistan,1989,327
4,Afghanistan,1990,327
...,...,...,...
6722,Zimbabwe,2013,657
6723,Zimbabwe,2014,657
6724,Zimbabwe,2015,657
6725,Zimbabwe,2016,657


In [8]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6727 entries, 0 to 6726
Data columns (total 3 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   Area                           6727 non-null   object
 1   Year                           6727 non-null   int64 
 2   average_rain_fall_mm_per_year  5953 non-null   object
dtypes: int64(1), object(2)
memory usage: 157.8+ KB


In [9]:
df2.describe()

Unnamed: 0,Year
count,6727.0
mean,2001.354839
std,9.530114
min,1985.0
25%,1993.0
50%,2001.0
75%,2010.0
max,2017.0


# Average Temperature Data:

In [10]:
df3

Unnamed: 0,Year,country,avg_temp
0,1849,Côte D'Ivoire,25.58
1,1850,Côte D'Ivoire,25.52
2,1851,Côte D'Ivoire,25.67
3,1852,Côte D'Ivoire,
4,1853,Côte D'Ivoire,
...,...,...,...
71306,2009,Mexico,21.76
71307,2010,Mexico,20.90
71308,2011,Mexico,21.55
71309,2012,Mexico,21.52


In [11]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 71311 entries, 0 to 71310
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Year      71311 non-null  int64  
 1   country   71311 non-null  object 
 2   avg_temp  68764 non-null  float64
dtypes: float64(1), int64(1), object(1)
memory usage: 1.6+ MB


In [12]:
df3.describe()

Unnamed: 0,Year,avg_temp
count,71311.0,68764.0
mean,1905.799007,16.183876
std,67.102099,7.59296
min,1743.0,-14.35
25%,1858.0,9.75
50%,1910.0,16.14
75%,1962.0,23.7625
max,2013.0,30.73


# Yield Data:

In [13]:
df4

Unnamed: 0,Domain Code,Domain,Area Code,Area,Element Code,Element,Item Code,Item,Year Code,Year,Unit,Value
0,QC,Crops,2,Afghanistan,5419,Yield,56,Maize,1961,1961,hg/ha,14000
1,QC,Crops,2,Afghanistan,5419,Yield,56,Maize,1962,1962,hg/ha,14000
2,QC,Crops,2,Afghanistan,5419,Yield,56,Maize,1963,1963,hg/ha,14260
3,QC,Crops,2,Afghanistan,5419,Yield,56,Maize,1964,1964,hg/ha,14257
4,QC,Crops,2,Afghanistan,5419,Yield,56,Maize,1965,1965,hg/ha,14400
...,...,...,...,...,...,...,...,...,...,...,...,...
56712,QC,Crops,181,Zimbabwe,5419,Yield,15,Wheat,2012,2012,hg/ha,24420
56713,QC,Crops,181,Zimbabwe,5419,Yield,15,Wheat,2013,2013,hg/ha,22888
56714,QC,Crops,181,Zimbabwe,5419,Yield,15,Wheat,2014,2014,hg/ha,21357
56715,QC,Crops,181,Zimbabwe,5419,Yield,15,Wheat,2015,2015,hg/ha,19826


In [14]:
df4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56717 entries, 0 to 56716
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Domain Code   56717 non-null  object
 1   Domain        56717 non-null  object
 2   Area Code     56717 non-null  int64 
 3   Area          56717 non-null  object
 4   Element Code  56717 non-null  int64 
 5   Element       56717 non-null  object
 6   Item Code     56717 non-null  int64 
 7   Item          56717 non-null  object
 8   Year Code     56717 non-null  int64 
 9   Year          56717 non-null  int64 
 10  Unit          56717 non-null  object
 11  Value         56717 non-null  int64 
dtypes: int64(6), object(6)
memory usage: 5.2+ MB


In [15]:
df4.describe()

Unnamed: 0,Area Code,Element Code,Item Code,Year Code,Year,Value
count,56717.0,56717.0,56717.0,56717.0,56717.0,56717.0
mean,125.650422,5419.0,111.611651,1989.66957,1989.66957,62094.660084
std,75.120195,0.0,101.278435,16.133198,16.133198,67835.932856
min,1.0,5419.0,15.0,1961.0,1961.0,0.0
25%,58.0,5419.0,56.0,1976.0,1976.0,15680.0
50%,122.0,5419.0,116.0,1991.0,1991.0,36744.0
75%,184.0,5419.0,125.0,2004.0,2004.0,86213.0
max,351.0,5419.0,489.0,2016.0,2016.0,1000000.0


# Merged Data:

In [16]:
data.shape

(139104, 15)

In [17]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 139104 entries, 0 to 56716
Data columns (total 15 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   Domain                         61066 non-null   object 
 1   Area                           67793 non-null   object 
 2   Element                        61066 non-null   object 
 3   Item                           61066 non-null   object 
 4   Year                           139104 non-null  int64  
 5   Unit                           61066 non-null   object 
 6   Value                          61066 non-null   float64
 7   average_rain_fall_mm_per_year  5953 non-null    object 
 8   country                        71311 non-null   object 
 9   avg_temp                       68764 non-null   float64
 10  Domain Code                    56717 non-null   object 
 11  Area Code                      56717 non-null   float64
 12  Element Code                   

In [18]:
data.head(5)

Unnamed: 0,Domain,Area,Element,Item,Year,Unit,Value,average_rain_fall_mm_per_year,country,avg_temp,Domain Code,Area Code,Element Code,Item Code,Year Code
0,Pesticides Use,Albania,Use,Pesticides (total),1990,tonnes of active ingredients,121.0,,,,,,,,
1,Pesticides Use,Albania,Use,Pesticides (total),1991,tonnes of active ingredients,121.0,,,,,,,,
2,Pesticides Use,Albania,Use,Pesticides (total),1992,tonnes of active ingredients,121.0,,,,,,,,
3,Pesticides Use,Albania,Use,Pesticides (total),1993,tonnes of active ingredients,121.0,,,,,,,,
4,Pesticides Use,Albania,Use,Pesticides (total),1994,tonnes of active ingredients,201.0,,,,,,,,


In [19]:
data.tail(5)

Unnamed: 0,Domain,Area,Element,Item,Year,Unit,Value,average_rain_fall_mm_per_year,country,avg_temp,Domain Code,Area Code,Element Code,Item Code,Year Code
56712,Crops,Zimbabwe,Yield,Wheat,2012,hg/ha,24420.0,,,,QC,181.0,5419.0,15.0,2012.0
56713,Crops,Zimbabwe,Yield,Wheat,2013,hg/ha,22888.0,,,,QC,181.0,5419.0,15.0,2013.0
56714,Crops,Zimbabwe,Yield,Wheat,2014,hg/ha,21357.0,,,,QC,181.0,5419.0,15.0,2014.0
56715,Crops,Zimbabwe,Yield,Wheat,2015,hg/ha,19826.0,,,,QC,181.0,5419.0,15.0,2015.0
56716,Crops,Zimbabwe,Yield,Wheat,2016,hg/ha,18294.0,,,,QC,181.0,5419.0,15.0,2016.0


# To check whether the data is balanced or imbalanced

In [20]:
#count the number of instances of each class label
count=data['Value'].value_counts()

#calculating the percentage of instances for each class label
percentage=count/data.shape[0]*100

#printing the class label counts and percentages
print("Class label counts: \n",count)
print("Class label percentages: \n",percentage)

#determining if the data is balanced or imbalanced
if (percentage[0] < 60 and percentage[1] < 60) or (percentage[0] > 40 and percentage[1] > 40):
    print("The data is balanced")
else:
    print("The data is imbalanced")


Class label counts: 
 100000.0    526
10000.0     484
20000.0     377
50000.0     331
40000.0     215
           ... 
169800.0      1
169600.0      1
166332.0      1
167820.0      1
18294.0       1
Name: Value, Length: 39398, dtype: int64
Class label percentages: 
 100000.0    0.378134
10000.0     0.347941
20000.0     0.271020
50000.0     0.237951
40000.0     0.154561
              ...   
169800.0    0.000719
169600.0    0.000719
166332.0    0.000719
167820.0    0.000719
18294.0     0.000719
Name: Value, Length: 39398, dtype: float64
The data is balanced


In [21]:
data.columns

Index(['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value',
       'average_rain_fall_mm_per_year', 'country', 'avg_temp', 'Domain Code',
       'Area Code', 'Element Code', 'Item Code', 'Year Code'],
      dtype='object')

In [22]:
data=data.rename(columns={'average_rain_fall_mm_per_year':'avg_rainfall_per_year'})

In [23]:
# Creating the Data Dictionary with first column being datatype.

Data_dict = pd.DataFrame(data.dtypes)
Data_dict

Unnamed: 0,0
Domain,object
Area,object
Element,object
Item,object
Year,int64
Unit,object
Value,float64
avg_rainfall_per_year,object
country,object
avg_temp,float64


In [24]:
# identifying the missing values from the dataset.

Data_dict['MissingVal'] = data.isnull().sum()
Data_dict

Unnamed: 0,0,MissingVal
Domain,object,78038
Area,object,71311
Element,object,78038
Item,object,78038
Year,int64,0
Unit,object,78038
Value,float64,78038
avg_rainfall_per_year,object,133151
country,object,67793
avg_temp,float64,70340


In [25]:
# Identifying unique values 

Data_dict['UniqueVal'] = data.nunique()
Data_dict

Unnamed: 0,0,MissingVal,UniqueVal
Domain,object,78038,2
Area,object,71311,261
Element,object,78038,2
Item,object,78038,11
Year,int64,0,275
Unit,object,78038,2
Value,float64,78038,39398
avg_rainfall_per_year,object,133151,173
country,object,67793,137
avg_temp,float64,70340,3303


In [26]:
# identifying count of the variable.

Data_dict['Count'] = data.count()
Data_dict

Unnamed: 0,0,MissingVal,UniqueVal,Count
Domain,object,78038,2,61066
Area,object,71311,261,67793
Element,object,78038,2,61066
Item,object,78038,11,61066
Year,int64,0,275,139104
Unit,object,78038,2,61066
Value,float64,78038,39398,61066
avg_rainfall_per_year,object,133151,173,5953
country,object,67793,137,71311
avg_temp,float64,70340,3303,68764


In [27]:
data.describe()

Unnamed: 0,Year,Value,avg_temp,Area Code,Element Code,Item Code,Year Code
count,139104.0,61066.0,68764.0,56717.0,56717.0,56717.0,56717.0
mean,1947.659931,59118.36,16.183876,125.650422,5419.0,111.611651,1989.66957
std,65.377468,73324.69,7.59296,75.120195,0.0,101.278435,16.133198
min,1743.0,0.0,-14.35,1.0,5419.0,15.0,1961.0
25%,1908.0,13284.75,9.75,58.0,5419.0,56.0,1976.0
50%,1974.0,32976.0,16.14,122.0,5419.0,116.0,1991.0
75%,1997.0,80472.0,23.7625,184.0,5419.0,125.0,2004.0
max,2017.0,1807000.0,30.73,351.0,5419.0,489.0,2016.0


In [28]:
# grouping based on Item

data.groupby('Item').count()

Unnamed: 0_level_0,Domain,Area,Element,Year,Unit,Value,avg_rainfall_per_year,country,avg_temp,Domain Code,Area Code,Element Code,Item Code,Year Code
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Cassava,5718,5718,5718,5718,5718,5718,0,0,0,5718,5718,5718,5718,5718
Maize,8631,8631,8631,8631,8631,8631,0,0,0,8631,8631,8631,8631,8631
Pesticides (total),4349,4349,4349,4349,4349,4349,0,0,0,0,0,0,0,0
Plantains and others,2654,2654,2654,2654,2654,2654,0,0,0,2654,2654,2654,2654,2654
Potatoes,7876,7876,7876,7876,7876,7876,0,0,0,7876,7876,7876,7876,7876
"Rice, paddy",6469,6469,6469,6469,6469,6469,0,0,0,6469,6469,6469,6469,6469
Sorghum,5511,5511,5511,5511,5511,5511,0,0,0,5511,5511,5511,5511,5511
Soybeans,4192,4192,4192,4192,4192,4192,0,0,0,4192,4192,4192,4192,4192
Sweet potatoes,6356,6356,6356,6356,6356,6356,0,0,0,6356,6356,6356,6356,6356
Wheat,6160,6160,6160,6160,6160,6160,0,0,0,6160,6160,6160,6160,6160


# Outliers

In [29]:
# Define the features to check for outliers
features = ["Year","Value","avg_temp"]

# Check for outliers in each feature
for feature in features:
    Q1 = np.percentile(data[feature].dropna(), 25)
    Q3 = np.percentile(data[feature].dropna(), 75)
    IQR = Q3 - Q1
    upper_bound = Q3 + 1.5 * IQR
    lower_bound = Q1 - 1.5 * IQR
    outliers = (data[feature] > upper_bound) | (data[feature] < lower_bound)
    n_outliers = np.sum(outliers)
    print(f"{n_outliers} outliers detected in {feature}")

2830 outliers detected in Year
3648 outliers detected in Value
2 outliers detected in avg_temp


In [30]:
# Select numerical columns to check for outliers
num_cols = data.select_dtypes(include=[np.number]).columns.tolist()

# Calculate the IQR for each numerical column
Q1 = data[num_cols].quantile(0.25)
Q3 = data[num_cols].quantile(0.75)
IQR = Q3 - Q1

# Calculate the upper and lower bounds for each numerical column
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Identify and remove any values that fall outside of the upper and lower bounds for each column
data_outliers_removed = data[~((data[num_cols] < lower_bound) | (data[num_cols] > upper_bound)).any(axis=1)]

# Print the original and new shape of the dataset to see how many outliers were removed
print("Original shape of data:", data.shape)
print("Shape of data after removing outliers:", data_outliers_removed.shape)


Original shape of data: (139104, 15)
Shape of data after removing outliers: (125967, 15)


# Missing Values

In [31]:
data.isnull().sum().sort_values(ascending=True)      

Year                          0
country                   67793
avg_temp                  70340
Area                      71311
Domain                    78038
Element                   78038
Item                      78038
Unit                      78038
Value                     78038
Domain Code               82387
Area Code                 82387
Element Code              82387
Item Code                 82387
Year Code                 82387
avg_rainfall_per_year    133151
dtype: int64

In [32]:
# Identify columns with missing values in data2014
missing_cols = data.columns[data.isnull().any()].tolist()

# Fill missing values with mean value for numerical columns and mode value for categorical columns
for col in missing_cols:
    if data[col].dtype == np.number:
        data[col].fillna(data[col].median(), inplace=True)
    else:
        data[col].fillna(data[col].mode()[0], inplace=True)

# Verify if there are any missing values left
print(data.isnull().sum().sum())

0


# Conversion

In [33]:
# Numerical Columns
num_attr = data.select_dtypes(['int']).columns  
num_attr

Index(['Year'], dtype='object')

In [34]:
# Categorical Columns
cat_attr = data.select_dtypes('object').columns
cat_attr

Index(['Domain', 'Area', 'Element', 'Item', 'Unit', 'avg_rainfall_per_year',
       'country', 'Domain Code'],
      dtype='object')

In [35]:
data["Domain"].unique()

array(['Pesticides Use', 'Crops'], dtype=object)

In [36]:
data["Area"].unique()

array(['Albania', 'Algeria', 'Angola', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belgium-Luxembourg', 'Belize', 'Bermuda', 'Bhutan',
       'Bolivia (Plurinational State of)', 'Botswana', 'Brazil',
       'Brunei Darussalam', 'Bulgaria', 'Burkina Faso', 'Burundi',
       'Cabo Verde', 'Cameroon', 'Canada', 'Central African Republic',
       'Chad', 'Chile', 'China, Hong Kong SAR', 'China, Macao SAR',
       'China, mainland', 'China, Taiwan Province of', 'Colombia',
       'Comoros', 'Congo', 'Cook Islands', 'Costa Rica', "Côte d'Ivoire",
       'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Dominican Republic',
       'Ecuador', 'Egypt', 'El Salvador', 'Eritrea', 'Estonia',
       'Ethiopia', 'Fiji', 'Finland', 'France', 'French Polynesia',
       'Gambia', 'Germany', 'Ghana', 'Greece', 'Guatemala', 'Guinea',
       'Guinea-Bissau', 'Guyana', 'Haiti', 'Ho

In [37]:
data["Item"].unique()

array(['Pesticides (total)', 'Maize', 'Potatoes', 'Rice, paddy', 'Wheat',
       'Sorghum', 'Soybeans', 'Cassava', 'Yams', 'Sweet potatoes',
       'Plantains and others'], dtype=object)

In [38]:
data["Element"].unique()

array(['Use', 'Yield'], dtype=object)

In [39]:
data["Unit"].unique()

array(['tonnes of active ingredients', 'hg/ha'], dtype=object)

In [40]:
data["avg_rainfall_per_year"].unique()

array(['1010', '327', '1485', '89', '1030', '591', '562', '534', '1110',
       '447', '..', '1292', '83', '2666', '1422', '618', '847', '1705',
       '1039', '2200', '1146', '1028', '416', '1761', '2722', '608',
       '748', '1274', '228', '1904', '1604', '537', '1342', '322', '1522',
       '645', '3240', '900', '1543', '1646', '2926', '1348', '1113',
       '1335', '498', '677', '703', '220', '2083', '1410', '2274', '51',
       '1784', '2156', '383', '626', '788', '848', '2592', '536', '867',
       '1831', '836', '1026', '700', '1187', '652', '2350', '1996',
       '1651', '1577', '2387', '1440', '1976', '589', '1940', '1083',
       '2702', '216', '1118', '435', '832', '2051', '1668', '111', '250',
       '630', '1054', '121', '533', '1834', '641', '661', '2391', '56',
       '656', '934', '619', '1513', '1181', '2875', '1972', '282', '560',
       '92', '2041', '758', '450', '241', '346', '1032', '2091', '285',
       '1500', '778', '1732', '2280', '151', '1150', '1414', '125'

In [41]:
data["country"].unique()

array(['United States', "Côte D'Ivoire", 'United Arab Emirates',
       'Nigeria', 'Ghana', 'Turkey', 'Australia', 'India', 'Egypt',
       'Algeria', 'Kazakhstan', 'Netherlands', 'China', 'Madagascar',
       'Eritrea', 'Greece', 'Iraq', 'Azerbaijan', 'Mali', 'Indonesia',
       'Thailand', 'Central African Republic', 'Spain', 'Venezuela',
       'Colombia', 'Lebanon', 'United Kingdom', 'Serbia', 'Brazil',
       'Libya', 'Germany', 'Switzerland', 'Guinea Bissau', 'Slovakia',
       'Congo', 'Belgium', 'Romania', 'Hungary', 'Burundi', 'Morocco',
       'Russia', 'Moldova', 'Sri Lanka', 'Guinea', 'Denmark', 'Argentina',
       'Senegal', 'Syria', 'Tanzania', 'Bangladesh', 'Qatar', 'Cameroon',
       'Ireland', 'South Africa', 'Tajikistan', 'Mexico', 'Pakistan',
       'Sierra Leone', 'Botswana', 'Guyana', 'Guatemala', 'Ecuador',
       'Vietnam', 'Zimbabwe', 'Finland', 'Japan', 'Sudan', 'Afghanistan',
       'Uganda', 'Taiwan', 'Nepal', 'Ukraine', 'Rwanda', 'Canada',
       'Jamaica', 

In [42]:
data["Domain Code"].unique()

array(['QC'], dtype=object)

In [43]:
# get a list of the categorical columns
cat_cols = data.select_dtypes(include=['object']).columns.tolist()

# perform label encoding
for col in cat_cols:
    data[col] = pd.factorize(data[col])[0]
data

Unnamed: 0,Domain,Area,Element,Item,Year,Unit,Value,avg_rainfall_per_year,country,avg_temp,Domain Code,Area Code,Element Code,Item Code,Year Code
0,0,0,0,0,1990,0,121.0,0,0,16.14,0,122.0,5419.0,116.0,1991.0
1,0,0,0,0,1991,0,121.0,0,0,16.14,0,122.0,5419.0,116.0,1991.0
2,0,0,0,0,1992,0,121.0,0,0,16.14,0,122.0,5419.0,116.0,1991.0
3,0,0,0,0,1993,0,121.0,0,0,16.14,0,122.0,5419.0,116.0,1991.0
4,0,0,0,0,1994,0,201.0,0,0,16.14,0,122.0,5419.0,116.0,1991.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56712,1,167,1,4,2012,1,24420.0,0,0,16.14,0,181.0,5419.0,15.0,2012.0
56713,1,167,1,4,2013,1,22888.0,0,0,16.14,0,181.0,5419.0,15.0,2013.0
56714,1,167,1,4,2014,1,21357.0,0,0,16.14,0,181.0,5419.0,15.0,2014.0
56715,1,167,1,4,2015,1,19826.0,0,0,16.14,0,181.0,5419.0,15.0,2015.0


In [44]:
data.dtypes

Domain                     int64
Area                       int64
Element                    int64
Item                       int64
Year                       int64
Unit                       int64
Value                    float64
avg_rainfall_per_year      int64
country                    int64
avg_temp                 float64
Domain Code                int64
Area Code                float64
Element Code             float64
Item Code                float64
Year Code                float64
dtype: object

In [45]:
data= data.astype('float64')

In [46]:
data.dtypes

Domain                   float64
Area                     float64
Element                  float64
Item                     float64
Year                     float64
Unit                     float64
Value                    float64
avg_rainfall_per_year    float64
country                  float64
avg_temp                 float64
Domain Code              float64
Area Code                float64
Element Code             float64
Item Code                float64
Year Code                float64
dtype: object