**CONTENT**

* About Data
* Importing Libraries
* Importing Data

**Visual **

* Disribution of food production using Area Abbreviation
* Food items graph
* Amount of Food and Feed production
* Distribution of food production around globe using Area

**About SVM**

* Linear SVM 
* Non-linear S VM

**Implementation**

* Spliting Data
* Fitting classifier
* Report

* Extra Notes
* Hyperparameter Tuning

**About Data**

The Food and Agriculture Organization (FAO) is specialized agency of the United Nations that leads international efforts to defeat hunger. 
Their goal is to achieve food security for all and make sure that people have regular access to enough high-quality food to lead active, healthy lives. With over 194 member states, FAO works in over 130 countries worldwide.This dataset provides an insight on our worldwide food production - focusing on a comparison between food produced for human consumption and feed produced for animals.

**Importing Libraries**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls

from sklearn import svm
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report


pd.set_option('display.max_column', None)
pd.set_option('display.max_row', None)

**Importing Data**

In [None]:
food=pd.read_csv("../input/FAO.csv" , encoding="latin1")

In [None]:
food=food.rename(columns={'Area Code':'area_code','Item Code':'item_code','Element Code':'element_code','Area Abbreviation':'area_abbreviation'})
food.shape

In [None]:
food['Unit']=food['Unit'].apply(lambda x:int(x.strip('tonnes')))
food.head()

**WE ARE GOING TO HAVE SOME VISUALS OF OUR DATA**

Food production around globe

In [None]:
area=food['area_abbreviation'].value_counts()
labels=(np.array(area.index))
value=(np.array((area/area.sum())*100))
trace=go.Pie(labels=labels,values=value)
layout=go.Layout(title='area')
data=[trace]
fig=go.Figure(data=data,layout=layout)
py.iplot(fig,filename='area')

We ca see that out of all the places CHN (china) has the most amount of food poduction in comparision to all other places.

**Item wise distribution**

In [None]:
plt.figure(figsize=(24,12))
item=food['Item'].value_counts()[:50]
sns.barplot(item.values,item.index)
sns.despine(left=True,right=True)
plt.show()

Production of  milk-excluding butter is significantly higher than any other followed by eggs,cereals .

**Distribution between elements (Food/Fodder)**

In [None]:
ele=food['Element'].value_counts()
labels=(np.array(ele.index))
values=(np.array((ele/ele.sum())*100))
trace=go.Pie(labels=labels,values=values)
layout=go.Layout(title="element")
data=[trace]
fig=go.Figure(data=data,layout=layout)
py.iplot(fig,filename=ele)
plt.savefig('joint.png')

We can see food production is very higher and that is not much of a suprise .

**Are wise distribution of production of food**

In [None]:
wor_df=pd.DataFrame(food['Area'].value_counts()).reset_index()
wor_df.columns=['cont','Production']
wor_df=wor_df.reset_index().drop('index',axis=1)

data = [ dict(
        type = 'choropleth',
        locations = wor_df['cont'],
        locationmode = 'country names',
        z = wor_df['Production'],
        text = wor_df['cont'],
        colorscale = [[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
            [0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"]],
        autocolorscale = False,
        reversescale = True,
        marker = dict(
            line = dict (
                color = 'rgb(180,180,180)',
                width = 0.5
            ) ),
        colorbar = dict(
            autotick = False,
            tickprefix = ' ',
            title = 'food production'),
      ) ]

layout = dict(
    title = 'production of food around globe',
    geo = dict(
        showframe = False,
        showcoastlines = True,
        projection = dict(
            type = 'Mercator'
        )
    )
)

fig = dict( data=data, layout=layout )
py.iplot( fig, validate=False, filename='world-map' )

Here we are seeing an intresting thing happening area_ abbreviation and Area are suppose to show similar data but here it is not showing the same thing 
if we are looking by area we are getting Spain as out highest producer followed by Itly but it is not the same story as shown by area_abbreviation which
shows China as highest producer.Lets, see why it is happening.

In [None]:
food['Area'].value_counts()[:5]

In [None]:
food['area_abbreviation'].value_counts()[:5]

In [None]:
j=food['area_abbreviation'].unique()
len(j)

In [None]:
i=food['Area'].unique()
len(i)

We can see the length of both are different.
if we see contents of both list we will find in Area feild there are many sub-parts of china like  China, Hong Kong SAR',
'China, Macao SAR', 'China, mainland', 'China, Taiwan Province of', which is aggregiate in abbreviation feild as CHN.

This graph hence shows a more detailed area wise distribution of production.

**About SVM**

Support vector machines (SVMs) are a particularly powerful and flexible class of supervised algorithms for both classification and regression. In this section, we will develop the intuition behind support vector machines and their use in classification problems.

For classifying two different classes we simply try to draw a line discriminating two different classes.

![img1](http://www.ippatsuman.com/wp-content/uploads/2014/08/scatterplot.png)

Support vectors are simply the data points and support vector help us to determine the plane which will segrigate classes by using support vectors.
The line which we see seperating two different classes is caled a hyperplane .
The task is to select the proper hyperplane is also there , in the above case it is very much a single hyperplane is possible 

![img3](https://www.hackerearth.com/blog/wp-content/uploads/2017/02/multple-hyperplanes.jpg)

In support vector machines, the line that maximizes this margin is the one we will choose as the optimal model. Support vector machines are an example of such a maximum margin estimator.

![img3](https://www.hackerearth.com/blog/wp-content/uploads/2017/02/Margin.png)

**For non-linear model**

 it is not always possible to use lines or planes and one requires a nonlinear region to separate these classes. Support Vector Machines handle such situations by using a kernel 
SVM has a technique called the kernel trick. These are functions which takes low dimensional input space and transform it to a higher dimensional space i.e. it converts not separable problem to separable problem, these functions are called kernels. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then find out the process to separate the data based on the labels or outputs you’ve defined.

![img4](https://www.hackerearth.com/blog/wp-content/uploads/2017/02/kernel.png)

**Implementation**

In [None]:
food=food.rename(columns={'Area Code':'area_code','Item Code':'item_code','Element Code':'element_code'})
food.shape

We will drop all the columns like are_abbreviation and Area because we have similar numric field like area_code and element_code.

In [None]:
food.drop(food.columns[[0,2,4,6]],axis=1,inplace=True)

In [None]:
food.describe()

we can see from above report that we have lots of null values from Y1961 till end.

In [None]:
food['Y1961'].isnull().sum()

Removing all nul values

In [None]:
for i in range(6,len(food.columns)):
    val=food.columns[i]
    food=food[np.isfinite(food[val])]

These are our 2 predicting class

5142-**FOOD**

5521-**FODDER**

In [None]:
food['element_code'].unique()

Setting target variable

In [None]:
target=food['element_code']
food=food.drop('element_code',axis=1)

Splitting data in train and test

In [None]:
x_train,x_test,y_train,y_test=train_test_split(food,target,test_size=0.2)
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)

**Fitting classifier**

**Extra notes**

The SVM is an approximate implementation of a theoretical bound on the generalisation performance that is independent of the dimensionality of the feature space. This means that there is a good reason to suggest that performing feature selection might not make the performance of the classifier any better.

The reason that the SVM works is because it uses regularisation (like ridge regression) to avoid over-fitting, so provided you set the regularisation parameter C properly (e.g. using cross-validation), the performance ought to be good without feature selection.

The thing that is often not mentioned about feature selection is that it can easily make performance worse. The reason for this is that the more choices about the model that are made by optimising some statistic evaluated over the training sample, the more likely you are to over-fit the training sample, and feature selection often ends up making many more choices about the model (worst case 2d where d is the number of parameters). In his monograph on feature subset selection for regression[0], Millar suggests that if you are primarily interested in generalisation performance, then use ridge regression instead and don't do any feature selection. This is in accord with my experience, I think the reason is that it is more difficult to over-fit with one continuous parameter tuned using cross-validation than choosing the best of the 2n combinations of features.

The C parameter tells the SVM optimization how much you want to avoid misclassifying each training example. For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points. For very tiny values of C, you should get misclassified examples, often even if your training data is linearly separable.

**Hyperparameter Tuning**

In [None]:
parameters = {'C': [1,10],'gamma':[0.001,0.01,1]}
clf=svm.SVC() 
clf = GridSearchCV(clf, parameters)
clf.fit(x_train,y_train)
classifier = clf.cv_results_

In [None]:
print(classifier)

In [None]:
clf.best_estimator_

In [None]:
clf.best_params_

when running my notebook manually i am getting c=1 and gamma =0.001

but after commiting values  change to c=1 and alpha=1(not correc

I have already tried c=0.01 , 0.1 , 1 , 10 just for the sake of low training time i am using the best c value from the results.

In [None]:
clf=svm.SVC(C=1,gamma=0.001)
grid_clf=clf.fit(x_train,y_train)

In [None]:
pred=clf.predict(x_test)

In [None]:
print(classification_report(y_test,pred))

In [None]:
print(confusion_matrix(y_test,pred))


**Any Feedback is appreciated**