In [2]:
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import pearsonr
from data_analysis import get_hyp1_data, plot_data, calc_corr, get_storm_data, find_for_crop
pd.options.plotting.backend = "plotly"

## Hypothesis 1: The hike in the price of food for developing countries is low as compared to underdeveloped countries.

### Read the food prices dataset

In [None]:
combined_data = get_hyp1_data()

### Pearson correlation coefficient and p-value for testing non-correlation

<font size = '3'>The Pearson correlation coefficient measures the linear relationship between two datasets. The calculation of the p-value relies on the assumption that each dataset is normally distributed. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets. (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html)</font>

## Now, lets compare the price of Bread for Kenya (Developing) and Gambia (Underdeveloped)

In [None]:
plot_data(combined_data, "Kenya", "Gambia", "Bread")
calc_corr(combined_data, "Kenya", "Gambia", "Bread")

<font size = '3'>Here, the line indicating the price of Bread in Kenya for the year 2013-2014 has a negative slope. The price of Bread in Gambia has a positive slope from the year 2013 to 2017, this indicates that the hike in price of bread for Gambia (underdeveloped) is more as compared to Kenya (Developing). Furthermore, the correlation coefficient is 0.08, which depicts the two variables do not follow a similar trend throughout the graph.</font>

# To conduct further analysis, let us test four more countries for the same commodity i.e Bread.
### First we consider, Tajikistan (Developing) and Guinea (Underdeveloped)

In [None]:
plot_data(combined_data, "Tajikistan", "Guinea", "Bread")
calc_corr(combined_data, "Tajikistan", "Guinea", "Bread")

### Tajikistan (Developing) and Kyrgyzstan (Underdeveloped)

In [None]:
plot_data(combined_data, "Tajikistan", "Kyrgyzstan", "Bread")
calc_corr(combined_data, "Tajikistan", "Kyrgyzstan", "Bread")

<font size = '3'>Lets consider the first graph i.e Tajikistan (Developing) and Guinea (Underdeveloped):
    
The change in the price of Bread in Tajikistan and Guinea observed several up and down variation over the years. Here the hike in the price of bread for Tajikistan (Developing) is more as compared to Guinea (Underdeveloped). This is the opposite of our stated hypothesis. 
    

Moving on to the next graph i.e Tajikistan (Developing) and Kyrgyzstan (Underdeveloped):

For the years 2006-2008, the hike in the price of bread for Kyrgyzstan (Underdeveloped) is more as compared to Tajikistan (Developing) whereas it is the opposite from 2008-2011 and 2012-2013.

Lastly, the correlation coefficient for the two graphs is -0.16 and 0.75 respectively.</font>


# Now, to further examine the trend, lets test our analysis for a different commodity, i.e Wheat

### Comparing price of Wheat for Afghanistan (Developing) and Pakistan (Underdeveloped)

In [None]:
plot_data(combined_data, "Afghanistan", "Pakistan", "Wheat")
calc_corr(combined_data, "Afghanistan", "Pakistan", "Wheat")

<font size = '3'>For the years 2013-2014 and 2015-2017, the hike in the price of Wheat for Pakistan (Underdeveloped) is more as compared to Afghanistan (Developing), which aligns to our said hypothesis. But if we observe the year 2014-2015, the price observe an opposite trend. The commodity price decreases in Afghanistan whereas it increases in Pakistan. Lastly, the correlation coefficient is 0.3.</font>

### Comparing price of Wheat for Nepal (Developing) and Ethiopia (Underdeveloped)

In [None]:
plot_data(combined_data, "Nepal", "Ethiopia", "Wheat")
calc_corr(combined_data, "Nepal", "Ethiopia", "Wheat")

### Comparing price of Wheat for India (Developing) and Pakistan (Underdeveloped)

In [None]:
plot_data(combined_data, "India", "Pakistan", "Wheat")
calc_corr(combined_data, "India", "Pakistan", "Wheat")

<font size = '3'>Let us consider the first graph i.e Nepal (Developing) and Ethiopia (Underdeveloped):
    
The graph does not follow a constant trend and several variations can be seen in the hike in the prices.

Similarly, in the above graph i.e India (Developing) and Pakistan (Underdeveloped):
    
There is a hike in the price of Wheat in Pakistan (Underdeveloped) for the year 2013-2014 and 2015-2016. On the other hand, the hike in the price of Wheat in India (Developing) is more as compared to Pakistan (Underdeveloped).
    

Laslty, the correlation coefficient for the two graphs is -0.17 and 0.22 respectively, this confirms our observation.</font>

### From the above analysis, it is clearly evident that the hike in the food prices in Developing and Underdeveloped countries are not related.
# Therefore we reject our hypothesis

# Hypothesis 2: The occurrence of a storm has effect over the rise in various food price

In [3]:
data = pd.read_csv("/Users/dp/Desktop/SEM 1/IS597PR/2021Fall_finals/Dataset/FAOSTAT_data_all_crops_data.csv",
                        usecols = ['Domain','Area Code (FAO)','Area','Element','Item Code',
                                   'Item','Year','Months','Unit','Value'])
#                    warn_bad_lines=False, error_bad_lines=False, engine='python', skipfooter=1)
storm_data = get_storm_data()

    

Unnamed: 0,Domain,Area Code (FAO),Area,Element,Item Code,Item,Year,Months,Unit,Value
0,Producer Prices,231,United States of America,Producer Price (LCU/tonne),515,Apples,2010,January,LCU,481.0
1,Producer Prices,231,United States of America,Producer Price (LCU/tonne),515,Apples,2010,February,LCU,536.0
2,Producer Prices,231,United States of America,Producer Price (LCU/tonne),515,Apples,2010,March,LCU,485.0
3,Producer Prices,231,United States of America,Producer Price (LCU/tonne),515,Apples,2010,April,LCU,461.0
4,Producer Prices,231,United States of America,Producer Price (LCU/tonne),515,Apples,2010,May,LCU,505.0
...,...,...,...,...,...,...,...,...,...,...
3148,Producer Prices,231,United States of America,Producer Price (LCU/tonne),15,Wheat,2019,August,LCU,160.0
3149,Producer Prices,231,United States of America,Producer Price (LCU/tonne),15,Wheat,2019,September,LCU,157.0
3150,Producer Prices,231,United States of America,Producer Price (LCU/tonne),15,Wheat,2019,October,LCU,164.0
3151,Producer Prices,231,United States of America,Producer Price (LCU/tonne),15,Wheat,2019,November,LCU,161.0


In [5]:
all_crop_names = ['Apples','Lentils', 'Wheat', 'Peas, dry', 'Peas, green', 'Pumpkins, squash and gourds', 'Sunflower seed', 'Tangerines, mandarins, clementines, satsumas', 'Cauliflowers and broccoli']
for crop in all_crop_names:
    f1, f2, pv = find_for_crop(crop, data, storm_data)
    f1.show()
    f2.show()
    print(pv, end="\n\n")

(0.0645762757156055, 0.6662923807917724)



(-0.2536841959495932, 0.07245232665893375)



(-0.01720904843723353, 0.8943860992396472)



(-0.3859438001973159, 0.022028332664139595)



(-0.8056571537462837, 0.05298363156726233)



(0.6000387758539151, 0.0875969467968427)



(0.24254283689685813, 0.08636881540895555)



(-0.5269022619721561, 0.029765893149198783)



(-0.3538026418657479, 0.040094962115948256)



#### Here we observe that some of the plots such as Cauliflower and Brocolli, Peas, dry have the P-value (i.e. the second value of the ) lower than 0.05 which might seem that it follows the idea proposed in the hypothesis. But most of the plots such as wheat, apples etc do not follow the trend. As the P value is greater than 0.05 and the correlation is close to 0 so we can say that, we fail to prove that storm prices have significant effect on the food crops.

### Therefore, we reject out hypothesis