<center><h1 class="list-group-item list-group-item-success">Retail Prices Of Commodities In India</center>
<img src = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSQGblZ3lAcUjYRXf1BlooEem7PeNM4UHUS-w&usqp=CAU" style = "width:30%">
    <h3>Context</h3>
<p>In the light of the recent surge in prices of petrol, prices of several day-to-day commodities have increased (or are expected to increase in the future). Let us take a deep dive into the historical price variation of different commodities in India.</p>

<h3>Content</h3>
    
<p>The dataset contains information about retail prices of commodities like fruits, vegetables, clothes, etc. The data captures weekly as well as monthly prices across different states and important/major districts/market centers.<p>

The commodities have generalized categories and do not contain brand information. 

<li>
State: Indian state. Example: Maharashtra, Madhya Pradesh, Rajasthan, etc.
</li>
<li>Center: Market center. Example: Mumbai, Pune, Bangalore, etc.</li>
<li>
Commodity: Name of the commodity. Example: Fish, Apple, Dhoti, Saree, Ghee,
etc.
</li>
<li>
Variety: Subtype of the commodity. Example: Type of apple is Delicious medium
size.
</li>
<li>Unit: Measurement unit. Example: Kg, Litre, Dozens, etc.</li>
<li>Category: Food or Non Food</li>
<li>
Date: For weekly, it is a weekly date and for monthly it is a monthly date.
</li>
<li>Retail price: Retail price of the commodity in rupees.</li>
<br>
    
<h3>Contents:</h3>
<font size = 3.5 color = "blue">
<li>Importing Packages</li>
<li>Importing Data</li>
<li>Analysing Data</li>
<li>Data Overview</li>
<li>One Hot encoding</li>
<li>Data Transformation</li>
<li>Predicting the Retail Prices</li>
<li>Training Models</li>
<li>Evaluation Metrics</li>

# Importing Packages

In [None]:
import pandas as pd
import numpy as np
import category_encoders as ce
from sklearn.linear_model import LinearRegression,Ridge
from sklearn.model_selection import train_test_split,RepeatedKFold,GridSearchCV
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
import pandas_profiling as pp

# Importing Data

In [None]:
df = pd.read_csv("../input/retail-prices-of-commodities-in-india/Monthly_Food_Retail_Prices.csv")

In [None]:
df

# Analysing Data

In [None]:
pp.ProfileReport(df)

In [None]:
for i in df.columns:
    print("Column Nam",i,"\n",df[i].value_counts())

In [None]:
df.isnull().sum()

# Data Overview

In [None]:
df.describe()

In [None]:
df["Commodity"].fillna("Not Available",inplace=True)
df["Variety"].fillna("FAQ",inplace = True)

In [None]:
df.isnull().sum()

In [None]:
df["State"].value_counts()

# One Hot encoding

In [None]:
def one_hot_encoding(df,col):
    one_hot_encoder=ce.OneHotEncoder(cols=col,return_df=True,use_cat_names=True)
    df_final = one_hot_encoder.fit_transform(df)
    return df_final

In [None]:
df = one_hot_encoding(df,"State")
df = one_hot_encoding(df,"Centre")
df = one_hot_encoding(df,"Commodity")
df = one_hot_encoding(df,"Variety")
df = one_hot_encoding(df,"Unit")

In [None]:
df

# Data Transformation

In [None]:
df[["Month","Year"]] = df['Date'].str.split('-',n=1,expand=True)

In [None]:
def split_Date(val):
    mon={'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, 'MAY':5, 'JUN':6, 'JUL':7, 'AUG':8, 'SEP':9,'OCT':10, 'NOV':11, 'DEC':12}
    return mon[val]

In [None]:
df["Month"] = df["Month"].apply(split_Date)

In [None]:
df

In [None]:
columns_to_be_removed = ["Category","Date"]

In [None]:
df.drop(columns_to_be_removed,axis = 1,inplace= True)

In [None]:
df_test = df[df["Retail Price"].isnull()==True]


In [None]:
df_train = df.dropna()
df_train

# Predicting the Retail Prices

In [None]:
X_train = df_train.drop("Retail Price",axis = 1)
X_test = df_test.drop("Retail Price",axis = 1)
Y_train = df_train["Retail Price"]

In [None]:
lr = LinearRegression()

In [None]:
lr.fit(X_train,Y_train)

In [None]:
Y_pred = lr.predict(X_test)

In [None]:
df_test["Retail Price"] = Y_pred

In [None]:
df_test

In [None]:
df_final = pd.concat([df_train,df_test],axis = 0)

In [None]:
df_final = df_final.reset_index(drop=True)

In [None]:
df_final

In [None]:
X = df_final.drop("Retail Price",axis = 1)
Y = df_final["Retail Price"]

In [None]:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,random_state=27,test_size = 0.25)

# Training Models

In [None]:
lr = LinearRegression()

In [None]:
lr.fit(X_train,Y_train)

In [None]:
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, Y_train)

# Evaluation Metrics

In [None]:
Y_pred_lr = lr.predict(X_test)

In [None]:
r2_score(Y_pred_lr,Y_test)

In [None]:
Y_pred_ridge = ridge.predict(X_test)

In [None]:
r2_score(Y_pred_ridge,Y_test)

# Thank You 🤗
### I hope you had a good time reading my notebook. Pls do support and comment! 😎