# The program of linear regression

## Training data set
### We selected 11 features to describe one year's data for a country, and used Global Hunger Index (GHI) as label data.

### Features considered
### 1. Population movement
### &emsp;&emsp;- Population Growth - Change in rate actual number is good 
### &emsp;&emsp;- Tourism / Visas in/out 

### 2. Climate (Check NOAA.gov site for widespread data)	
### &emsp;&emsp;- Occurrence of natural disasters:  floods, droughts, earthquake: http://www.emdat.be/database
### &emsp;&emsp;- Land ( agricultural area, soil type): http://www.fao.org/faostat/en/#data/RL

### 3. Trade / Economics
### &emsp;&emsp;- MSCI(monthly): https://www.msci.com/en/end-of-day-data-search
### &emsp;&emsp;- Economic Freedom Index: http://www.heritage.org/index/
### &emsp;&emsp;- Food Price Index (FAO.org Statistics): http://www.fao.org/faostat/en/#data/FS

### 4. Food Stress index)
### &emsp;&emsp;- Per capita food production variability(USD per person)
### &emsp;&emsp;- Food Access
#### &emsp;&emsp;&emsp;> Percentage of Paved Roads
#### &emsp;&emsp;&emsp;> Road Density
#### &emsp;&emsp;&emsp;> Rail line density per 100 square km of land area


### The example data is as follow:
 | year | popu_growth | tourism | disaster | land	| msci | economic_freedom	| food_price | capita_production | paved_road | road_density | rail_line | GHI |
 | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ----| ---- |
 |2003 | 0.97 | 5903000 | 2 | 30041 | 185.9475 | 56.7 | 5.45 | 15200 | 56.9 | 12.9 | 0.4 | 6.4 |

In [6]:
'''
Created on Mar 26, 2017

@author: Cheng-lin Li
'''
import sys
import numpy as np
import math
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.linear_model import LinearRegression


    

<bound method LinearRegression.get_params of LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)>
('score =', 0.99228738329541855)
('W=', array([   9.39903755,    2.25455302,    1.85190233,   22.66776368,
         -1.96410922,   16.11119   ,   -4.56043802,    5.95165782,
         14.48453394, -458.98465133, -205.04612532]))


### We used scikit-learn library and called the Linear Regression function to apply on training data, and get a result of weighting array. The weight represents importance / influence of each feature. (e.g. weight of population_growth is 9.399)

In [None]:
INPUT_FILE = '/home/dianba/summarize2.csv' #Default input file name
ORIG_STDOUT = None
#OUTPUT_FILE = 'output.txt' # OUTPUT_FILE COULD BE 'OUTPUT_FILE = None' for console or file name (e.g. 'OUTPUT_FILE = 'output.txt') for file.'
OUTPUT_FILE = None # OUTPUT_FILE COULD BE 'OUTPUT_FILE = None' for console or file name (e.g. 'OUTPUT_FILE = 'output.txt') for file.'

def getInputData(filename):
    _data = np.genfromtxt(filename, delimiter = ',')
    _X = _data[1:, 1:12] # variable numbers are 11
    _Y = _data[1:, 12]  # column for label data
    return _X, _Y    

if __name__ == '__main__':

    input_file = ''
    output_file = ''
    
    if len(sys.argv) < 2 : 
        print('Usage of Linear Regression: %s input_matrix.dat output.txt '%(sys.argv[0]))
        print('    input_matrix is the input variable matrix.')
        print('    output.txt will output weights for each dimensions')

    else:
        input_file = sys.argv[1] if len(sys.argv) > 1 else INPUT_FILE
        output_file = sys.argv[2] if len(sys.argv) > 2 else OUTPUT_FILE
        
    
    X, Y = getInputData("summarize2.csv") #Get column 1,2 as X, column 3 as Z
    
    lr = LinearRegression(normalize = False)
 
    lr.fit(X, Y)
    print(str(lr.get_params))
    score = lr.score(X, Y)    
    W = lr.coef_
    print('score =', score)
    print ('W=', W )

### Offer prediction for Food Security Index

In [7]:
New_Y = lr.predict(X)
print('Prediction:%s'%(New_Y))

Prediction:[ 6.32979243  6.49549067  6.43982291  6.25310746  6.9372728   6.49451374
  5.9079603   5.67032199  5.80827114  4.93411603  5.20784129  9.64012428
  9.33136497]
