# Counterfeit Medicines Sales Prediction
Counterfeit medicines are fake medicines which are either contaminated or contain wrong or no active ingredient. They could have the right active ingredient but at the wrong dose. Counterfeit drugs are illegal and are harmful to health. 10% of the world's medicine is counterfeit and the problem is even worse in developing countries. Up to 30% of medicines in developing countries are counterfeit.

Millions of pills, bottles and sachets of counterfeit and illegal medicines are being traded across the world. The World Health Organization (WHO) is working with International Criminal Police Organization (Interpol) to dislodge the criminal networks raking in billions of dollars from this cynical trade.

Despite all these efforts, counterfeit medicine selling rackets don’t seem to stop popping here and there. It has become a challenge to deploy resources to counter these; without spreading them too thin and eventually rendering them ineffective. Government has decided that they should focus on illegal operations of high net worth first instead of trying to control all of them. In order to do that they have collected data which will help them to predict sales figures given an illegal operation's characteristics.


# Objective
To predict sales figures related to counterfeit medicine selling operations. 

# Data Files
Train Dataset = counterfeit_train.csv

Test Dataset = counterfeit_test.csv

# Import the required libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt

# Read both train and test dataset

In [2]:
cf_train = pd.read_csv("counterfeit_train.csv")
cf_test  = pd.read_csv("counterfeit_test.csv")
cf_test["Counterfeit_Sales"]=np.nan
cf_test.head()

Unnamed: 0,Medicine_ID,Counterfeit_Weight,DistArea_ID,Active_Since,Medicine_MRP,Medicine_Type,SidEffect_Level,Availability_rating,Area_Type,Area_City_Type,Area_dist_level,Counterfeit_Sales
0,HLZ81,,Area027,1983,85.5328,Antibiotics,mild,0.112747,CityLimits,Tier 3,Medium,
1,ECE94,13.45,Area045,2000,257.146,OralContraceptives,mild,0.144446,DownTown,Tier 2,Unknown,
2,SAD14,7.1,Area045,2000,98.1172,Antipyretics,mild,0.144221,DownTown,Tier 2,Unknown,
3,EQV63,18.3,Area010,1996,135.373,Tranquilizers,mild,0.100388,MidTownResidential,Tier 3,Unknown,
4,AIR10,,Area019,1983,112.8016,OralContraceptives,mild,0.022585,MidTownResidential,Tier 1,Small,
5,LIC37,14.45,Area010,1996,190.2976,OralContraceptives,mild,0.074382,MidTownResidential,Tier 3,Unknown,


In [3]:
cf_train.head(6)

Unnamed: 0,Medicine_ID,Counterfeit_Weight,DistArea_ID,Active_Since,Medicine_MRP,Medicine_Type,SidEffect_Level,Availability_rating,Area_Type,Area_City_Type,Area_dist_level,Counterfeit_Sales
0,RRA15,13.1,Area046,1995,160.2366,Antimalarial,critical,0.070422,DownTown,Tier 1,Small,1775.5026
1,YVV26,,Area027,1983,110.4384,Mstablizers,mild,0.013,CityLimits,Tier 3,Medium,3069.152
2,LJC15,9.025,Area046,1995,259.4092,Cardiac,mild,0.060783,DownTown,Tier 1,Small,2603.092
3,GWC40,11.8,Area046,1995,99.983,OralContraceptives,mild,0.065555,DownTown,Tier 1,Small,1101.713
4,QMN13,,Area019,1983,56.4402,Hreplacements,critical,0.248859,MidTownResidential,Tier 1,Small,158.9402
5,JDG81,8.775,Area045,2000,165.5656,Antiseptics,mild,0.088881,DownTown,Tier 2,Unknown,3047.8464


In [4]:
cf_train["data"]="train"
cf_test["data"]="test"
cf_all= pd.concat([cf_train,cf_test],axis=0)

In [5]:
cf_all.dtypes

Medicine_ID             object
Counterfeit_Weight     float64
DistArea_ID             object
Active_Since             int64
Medicine_MRP           float64
Medicine_Type           object
SidEffect_Level         object
Availability_rating    float64
Area_Type               object
Area_City_Type          object
Area_dist_level         object
Counterfeit_Sales      float64
data                    object
dtype: object

# Data Preparation

In [6]:
cf_all.drop('Medicine_ID',inplace=True,axis=1)

In [7]:
cf_all.DistArea_ID.nunique()

10

In [8]:
list(zip(cf_all.columns, cf_all.nunique()))

[('Counterfeit_Weight', 415),
 ('DistArea_ID', 10),
 ('Active_Since', 9),
 ('Medicine_MRP', 5970),
 ('Medicine_Type', 16),
 ('SidEffect_Level', 2),
 ('Availability_rating', 7884),
 ('Area_Type', 4),
 ('Area_City_Type', 3),
 ('Area_dist_level', 4),
 ('Counterfeit_Sales', 3142),
 ('data', 2)]

In [9]:
cf_all.Active_Since = cf_all.Active_Since.astype(str)
cf_all.Active_Since.dtype

dtype('O')

In [10]:
cat_cols=cf_all.select_dtypes(['object']).columns
cat_cols=cat_cols[:-1]
print(cat_cols)

Index(['DistArea_ID', 'Active_Since', 'Medicine_Type', 'SidEffect_Level',
       'Area_Type', 'Area_City_Type', 'Area_dist_level'],
      dtype='object')


In [11]:
dummies = pd.get_dummies(cf_all[['DistArea_ID', 'Active_Since', 'Medicine_Type', 'SidEffect_Level',
       'Area_Type', 'Area_City_Type', 'Area_dist_level']], drop_first=True)

In [12]:
dummies.shape

(8523, 41)

In [13]:
dummies.head()

Unnamed: 0,DistArea_ID_Area013,DistArea_ID_Area017,DistArea_ID_Area018,DistArea_ID_Area019,DistArea_ID_Area027,DistArea_ID_Area035,DistArea_ID_Area045,DistArea_ID_Area046,DistArea_ID_Area049,Active_Since_1985,...,Medicine_Type_Tranquilizers,SidEffect_Level_mild,Area_Type_DownTown,Area_Type_Industrial,Area_Type_MidTownResidential,Area_City_Type_Tier 2,Area_City_Type_Tier 3,Area_dist_level_Medium,Area_dist_level_Small,Area_dist_level_Unknown
0,0,0,0,0,0,0,0,1,0,0,...,0,0,1,0,0,0,0,0,1,0
1,0,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,0,1,1,0,0
2,0,0,0,0,0,0,0,1,0,0,...,0,1,1,0,0,0,0,0,1,0
3,0,0,0,0,0,0,0,1,0,0,...,0,1,1,0,0,0,0,0,1,0
4,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,1,0


In [14]:
cf_all.isnull().sum()
for col in cf_all.columns:
    if (col not in ['Counterfeit_Sales','data'])& (cf_all[col].isnull().sum()>0):
        cf_all.loc[cf_all[col].isnull(),col]=cf_all.loc[cf_all['data']=='train',col].mean()
cf_all.isnull().sum()

Counterfeit_Weight        0
DistArea_ID               0
Active_Since              0
Medicine_MRP              0
Medicine_Type             0
SidEffect_Level           0
Availability_rating       0
Area_Type                 0
Area_City_Type            0
Area_dist_level           0
Counterfeit_Sales      1705
data                      0
dtype: int64

In [15]:
cf_all.drop(['DistArea_ID', 'Active_Since', 'Medicine_Type', 'SidEffect_Level',
       'Area_Type', 'Area_City_Type', 'Area_dist_level'], axis = 1, inplace = True)

In [16]:
cf_all.head()

Unnamed: 0,Counterfeit_Weight,Medicine_MRP,Availability_rating,Counterfeit_Sales,data
0,13.1,160.2366,0.070422,1775.5026,train
1,14.115057,110.4384,0.013,3069.152,train
2,9.025,259.4092,0.060783,2603.092,train
3,11.8,99.983,0.065555,1101.713,train
4,14.115057,56.4402,0.248859,158.9402,train


In [17]:
cf_all = pd.concat([cf_all, dummies], axis = 1)
cf_all.head()

Unnamed: 0,Counterfeit_Weight,Medicine_MRP,Availability_rating,Counterfeit_Sales,data,DistArea_ID_Area013,DistArea_ID_Area017,DistArea_ID_Area018,DistArea_ID_Area019,DistArea_ID_Area027,...,Medicine_Type_Tranquilizers,SidEffect_Level_mild,Area_Type_DownTown,Area_Type_Industrial,Area_Type_MidTownResidential,Area_City_Type_Tier 2,Area_City_Type_Tier 3,Area_dist_level_Medium,Area_dist_level_Small,Area_dist_level_Unknown
0,13.1,160.2366,0.070422,1775.5026,train,0,0,0,0,0,...,0,0,1,0,0,0,0,0,1,0
1,14.115057,110.4384,0.013,3069.152,train,0,0,0,0,1,...,0,1,0,0,0,0,1,1,0,0
2,9.025,259.4092,0.060783,2603.092,train,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
3,11.8,99.983,0.065555,1101.713,train,0,0,0,0,0,...,0,1,1,0,0,0,0,0,1,0
4,14.115057,56.4402,0.248859,158.9402,train,0,0,0,1,0,...,0,0,0,0,1,0,0,0,1,0


In [18]:
cf_train=cf_all[cf_all["data"]=="train"]
del cf_train["data"]
cf_test=cf_all[cf_all["data"]=="test"]
cf_test.drop(["data","Counterfeit_Sales"],inplace=True,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [19]:
cf_train.shape, cf_test.shape

((6818, 45), (1705, 44))

In [20]:
from sklearn.model_selection import train_test_split
t1,t2=train_test_split(cf_train,test_size=0.2,random_state=123)

In [21]:
x_train1=t1.drop('Counterfeit_Sales',axis=1)
y_train1=t1['Counterfeit_Sales']
x_train2=t2.drop('Counterfeit_Sales',axis=1)
y_train2=t2['Counterfeit_Sales']

# Data Modeling 

In [22]:
# Applying linear regression
from sklearn.linear_model import LinearRegression

In [23]:
lm=LinearRegression()

In [24]:
lm.fit(x_train1,y_train1)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [25]:
p_test=lm.predict(x_train2)

residual=p_test-y_train2
rmse_lm=np.sqrt(np.dot(residual,residual)/len(p_test))

mn=rmse_lm
mn

1131.4524237397761

In [26]:
print(1-(mn/1660))

0.31840215437362884


# Ridge Regression

In [27]:
from sklearn.linear_model import Ridge,Lasso
from sklearn.model_selection import GridSearchCV

In [28]:
lambdas=np.linspace(1,100,100)

In [29]:
params={'alpha':lambdas}

In [30]:
model=Ridge(fit_intercept=True)

In [31]:
grid_search=GridSearchCV(model,param_grid=params,cv=10,scoring='neg_mean_absolute_error')

In [32]:
grid_search.fit(x_train1,y_train1)

GridSearchCV(cv=10, error_score='raise-deprecating',
             estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=None, normalize=False, random_state=None,
                             solver='auto', tol=0.001),
             iid='warn', n_jobs=None,
             param_grid={'alpha': array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,
        23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,...
        34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,
        45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,
        56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,
        67.,  68.,  69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,
        78.,  79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,
        89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.,
       100.]

In [33]:
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  results['mean_test_score'][candidate],
                  results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

In [34]:
grid_search.best_estimator_
report(grid_search.cv_results_,100)

Model with rank: 1
Mean validation score: -822.399 (std: 24.466)
Parameters: {'alpha': 75.0}

Model with rank: 2
Mean validation score: -822.399 (std: 24.471)
Parameters: {'alpha': 76.0}

Model with rank: 3
Mean validation score: -822.399 (std: 24.461)
Parameters: {'alpha': 74.0}

Model with rank: 4
Mean validation score: -822.400 (std: 24.476)
Parameters: {'alpha': 77.0}

Model with rank: 5
Mean validation score: -822.400 (std: 24.456)
Parameters: {'alpha': 73.0}

Model with rank: 6
Mean validation score: -822.401 (std: 24.451)
Parameters: {'alpha': 72.0}

Model with rank: 7
Mean validation score: -822.401 (std: 24.480)
Parameters: {'alpha': 78.0}

Model with rank: 8
Mean validation score: -822.402 (std: 24.447)
Parameters: {'alpha': 71.0}

Model with rank: 9
Mean validation score: -822.402 (std: 24.485)
Parameters: {'alpha': 79.0}

Model with rank: 10
Mean validation score: -822.403 (std: 24.442)
Parameters: {'alpha': 70.0}

Model with rank: 11
Mean validation score: -822.404 (std: 2

In [35]:
grid_search.cv_results_

{'mean_fit_time': array([0.00870051, 0.00760033, 0.00800047, 0.00790052, 0.00770049,
        0.00810044, 0.00750046, 0.00780039, 0.00800049, 0.01040063,
        0.00690033, 0.00672507, 0.0062501 , 0.00705011, 0.00655029,
        0.00637507, 0.0062501 , 0.00612507, 0.00687501, 0.00612512,
        0.00575008, 0.00732539, 0.0067004 , 0.00637512, 0.0062501 ,
        0.00637507, 0.00637512, 0.00662508, 0.00625007, 0.00637503,
        0.00587513, 0.0063751 , 0.00687509, 0.00637507, 0.00787518,
        0.00612504, 0.00637503, 0.00625005, 0.00662508, 0.0063751 ,
        0.00612509, 0.00775015, 0.00625014, 0.00662503, 0.00725009,
        0.00637512, 0.00587502, 0.00612509, 0.00637512, 0.00575008,
        0.00574996, 0.00612509, 0.00587497, 0.00600016, 0.00700009,
        0.00800006, 0.00837507, 0.00812519, 0.00875018, 0.00837514,
        0.0076252 , 0.00800011, 0.00650008, 0.00662508, 0.00625   ,
        0.00675018, 0.0065001 , 0.00562503, 0.00662515, 0.00562508,
        0.0062501 , 0.00625014,

In [36]:
from sklearn.metrics import mean_absolute_error

In [37]:
predicted_rdg=grid_search.predict(x_train2)
mn2= mean_absolute_error(y_train2,predicted_rdg)
print(1-(mn2/1660))

0.4913863055213584


# LASSO REGRESSION

In [38]:
lambdas=np.linspace(0.1,20,200)

In [39]:
params={'alpha':lambdas}

In [40]:
model=Lasso(fit_intercept=True)

In [41]:
grid_search=GridSearchCV(model,param_grid=params,cv=20,scoring='neg_mean_absolute_error')

In [42]:
grid_search.fit(x_train1,y_train1)

  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)


  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)
  positive)


GridSearchCV(cv=20, error_score='raise-deprecating',
             estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=1000, normalize=False, positive=False,
                             precompute=False, random_state=None,
                             selection='cyclic', tol=0.0001, warm_start=False),
             iid='warn', n_jobs=None,
             param_grid={'alpha': array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ,  1.1,
        1.2,  1.3,  1.4,  1.5,  1.6,  1...
       14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15. , 15.1, 15.2, 15.3, 15.4,
       15.5, 15.6, 15.7, 15.8, 15.9, 16. , 16.1, 16.2, 16.3, 16.4, 16.5,
       16.6, 16.7, 16.8, 16.9, 17. , 17.1, 17.2, 17.3, 17.4, 17.5, 17.6,
       17.7, 17.8, 17.9, 18. , 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7,
       18.8, 18.9, 19. , 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7, 19.8,
       19.9, 20. ])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,


In [43]:
grid_search.best_estimator_
report(grid_search.cv_results_,100)

Model with rank: 1
Mean validation score: -820.534 (std: 47.514)
Parameters: {'alpha': 7.3999999999999995}

Model with rank: 2
Mean validation score: -820.534 (std: 47.514)
Parameters: {'alpha': 7.599999999999999}

Model with rank: 3
Mean validation score: -820.534 (std: 47.514)
Parameters: {'alpha': 7.499999999999999}

Model with rank: 4
Mean validation score: -820.535 (std: 47.513)
Parameters: {'alpha': 7.699999999999999}

Model with rank: 5
Mean validation score: -820.535 (std: 47.513)
Parameters: {'alpha': 7.299999999999999}

Model with rank: 6
Mean validation score: -820.535 (std: 47.511)
Parameters: {'alpha': 7.799999999999999}

Model with rank: 7
Mean validation score: -820.537 (std: 47.508)
Parameters: {'alpha': 7.899999999999999}

Model with rank: 8
Mean validation score: -820.538 (std: 47.511)
Parameters: {'alpha': 7.199999999999999}

Model with rank: 9
Mean validation score: -820.540 (std: 47.505)
Parameters: {'alpha': 7.999999999999999}

Model with rank: 10
Mean validation 

In [44]:
grid_search.cv_results_

{'mean_fit_time': array([0.30815262, 0.28701638, 0.19093478, 0.14401783, 0.139008  ,
        0.10070575, 0.08095462, 0.06780392, 0.0786545 , 0.05360179,
        0.05131332, 0.04700071, 0.04281311, 0.03931311, 0.04081309,
        0.03525053, 0.03556303, 0.03187554, 0.0318755 , 0.0321255 ,
        0.02900044, 0.02806298, 0.02787542, 0.02743789, 0.02693789,
        0.02393782, 0.0241254 , 0.02362535, 0.0245629 , 0.02756286,
        0.02718788, 0.02125032, 0.02306283, 0.02056277, 0.02231282,
        0.01968781, 0.01962534, 0.01906278, 0.01881281, 0.01862528,
        0.01931279, 0.01950033, 0.01743773, 0.01768776, 0.01743773,
        0.01700025, 0.01787527, 0.01762521, 0.01675025, 0.01643776,
        0.01625026, 0.01581277, 0.0172502 , 0.01712526, 0.0163128 ,
        0.01468774, 0.01500027, 0.01493775, 0.01487521, 0.01462523,
        0.01487522, 0.01406273, 0.01462523, 0.01506277, 0.01350021,
        0.01362519, 0.0138127 , 0.01343771, 0.01425022, 0.01325017,
        0.01331272, 0.01462522,

In [45]:
predicted_ir1=grid_search.predict(x_train2)
mn1= mean_absolute_error(y_train2,predicted_ir1)
print(1-(mn1/1660))

0.49202503971492884


# **RANDOM FOREST REGRESSION**

In [46]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV

In [47]:
rf = RandomForestRegressor()

In [48]:
params = { 'criterion':['mse'],
    'max_depth':[None, 3, 5, 8],
    'min_samples_split':[2,5,8]}

In [49]:
random = RandomizedSearchCV(rf, param_distributions=params, n_iter = 50,cv=10,scoring='neg_mean_absolute_error', n_jobs = -1, 
                           verbose = 2)

In [50]:
random.fit(x_train1,y_train1)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 10 folds for each of 12 candidates, totalling 120 fits


[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    6.1s
[Parallel(n_jobs=-1)]: Done 120 out of 120 | elapsed:   10.1s finished


RandomizedSearchCV(cv=10, error_score='raise-deprecating',
                   estimator=RandomForestRegressor(bootstrap=True,
                                                   criterion='mse',
                                                   max_depth=None,
                                                   max_features='auto',
                                                   max_leaf_nodes=None,
                                                   min_impurity_decrease=0.0,
                                                   min_impurity_split=None,
                                                   min_samples_leaf=1,
                                                   min_samples_split=2,
                                                   min_weight_fraction_leaf=0.0,
                                                   n_estimators='warn',
                                                   n_jobs=None, oob_score=False,
                                                   random_state=

In [51]:
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  results['mean_test_score'][candidate],
                  results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

In [52]:
random.best_estimator_
report(random.cv_results_,100)

Model with rank: 1
Mean validation score: -748.388 (std: 24.039)
Parameters: {'min_samples_split': 8, 'max_depth': 5, 'criterion': 'mse'}

Model with rank: 2
Mean validation score: -749.096 (std: 22.769)
Parameters: {'min_samples_split': 5, 'max_depth': 5, 'criterion': 'mse'}

Model with rank: 3
Mean validation score: -749.113 (std: 21.122)
Parameters: {'min_samples_split': 2, 'max_depth': 5, 'criterion': 'mse'}

Model with rank: 4
Mean validation score: -755.636 (std: 23.091)
Parameters: {'min_samples_split': 8, 'max_depth': 8, 'criterion': 'mse'}

Model with rank: 5
Mean validation score: -756.828 (std: 24.612)
Parameters: {'min_samples_split': 2, 'max_depth': 8, 'criterion': 'mse'}

Model with rank: 6
Mean validation score: -757.073 (std: 29.261)
Parameters: {'min_samples_split': 5, 'max_depth': 8, 'criterion': 'mse'}

Model with rank: 7
Mean validation score: -795.358 (std: 27.249)
Parameters: {'min_samples_split': 5, 'max_depth': None, 'criterion': 'mse'}

Model with rank: 8
Mean 

In [53]:
random.cv_results_

{'mean_fit_time': array([0.55429571, 0.48651834, 0.44296789, 0.10912666, 0.11472986,
        0.124032  , 0.16652839, 0.16712949, 0.16215265, 0.24420483,
        0.24300382, 0.23767939]),
 'std_fit_time': array([0.03107513, 0.06797621, 0.03232154, 0.00567495, 0.01869488,
        0.0249795 , 0.01485097, 0.01164865, 0.00711346, 0.00944122,
        0.00984593, 0.0109092 ]),
 'mean_score_time': array([0.00637538, 0.00522532, 0.00490017, 0.00362506, 0.0036252 ,
        0.00367517, 0.00337505, 0.0036001 , 0.00387509, 0.00437503,
        0.00387514, 0.00405006]),
 'std_score_time': array([0.00143732, 0.00089765, 0.00029999, 0.000375  , 0.00043665,
        0.00044791, 0.00057287, 0.00046366, 0.00037493, 0.00062494,
        0.00037499, 0.00074005]),
 'param_min_samples_split': masked_array(data=[2, 5, 8, 2, 5, 8, 2, 5, 8, 2, 5, 8],
              mask=[False, False, False, False, False, False, False, False,
                    False, False, False, False],
        fill_value='?',
             dtyp

In [54]:
from sklearn.metrics import mean_absolute_error

In [55]:
predicted_rf=random.predict(x_train2)
mn_rf= mean_absolute_error(y_train2,predicted_rf)
print(1-(mn_rf/1660))

0.5418146012983092


In [56]:
print('Random Forest Regression : {0:.2f}'.format(1-(mn_rf/1660)))
print('-------------------------------------------------------')
print('Linear Regression with Lasso : {0:.2f}'.format(1-(mn1/1660)))
print('-------------------------------------------------------')
print('Linear Regression with Ridge : {0:.2f}'.format(1-(mn2/1660)))
print('-------------------------------------------------------')
print('Linear Regression : {0:.2f}'.format(1-(mn/1660)))
print('-------------------------------------------------------')

Random Forest Regression : 0.54
-------------------------------------------------------
Linear Regression with Lasso : 0.49
-------------------------------------------------------
Linear Regression with Ridge : 0.49
-------------------------------------------------------
Linear Regression : 0.32
-------------------------------------------------------


In [57]:
test_pred=predicted_rf=random.predict(cf_test)

In [58]:
pd.DataFrame(test_pred).to_csv("Bharath_Reddy_Python_Project.csv",index=False)

# Conclusion
I have applied linear,lasso,ridge regression and random forest models.When I compared the performance of the models Random Forest model performed well and achieved mean absolute error of 0.54.