## Socialcops Challenge

#### Working with Monthly_data_cmo dataset.

#### 1. Import required packages

In [1]:
%matplotlib inline
%config InlineBackend.figure_format='retina'

from __future__ import absolute_import, division, print_function

# Data wrangling
import pandas as pd
import numpy as np


# Display and Plotting
import matplotlib.pyplot as plt
import seaborn as sns

import os

# pandas
pd.set_option('display.float_format', lambda x: '%.5f' % x) 
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

# seaborn plotting style
sns.set(style = 'ticks', context = 'poster')
sns.set(rc={'figure.figsize':(10,6)})

In [2]:
# Print version of modules
def version(package_name: str, import_as):
    return ("{} version: {}".format(package_name, import_as.__version__))

In [3]:
print(version('Pandas', pd))
print(version('Numpy', np))
print(version('Seaborn', sns))

Pandas version: 0.23.4
Numpy version: 1.15.4
Seaborn version: 0.9.0


We have successfully imported all the required packages and libraries. Now, it's time to load the dataset.

#### 2. Load the dataset

The data we are going to work on is inside the Monthly_data_cmo csv file. You can find this file inside the 'data/' folder. This dataset contains monthly data about the quantity arrival in market, minimum price, maximum price or average price of different commodities of APMC (Agricultural produce market committee).

The main attributes or features of the dataset are:

* <b>APMC:</b> Agricultural Produce Market Committee
* <b>Commodity</b>
* <b>Year</b>
* <b>Month</b>
* <b>arrivals_in_qtl:</b> Quantity arrival in market (in quintal)
* <b>min_price:</b> Minimum price charged per quintal
* <b>max_price:</b> Maximum price charged per quintal
* <b>modal_price:</b> Mode (Average)price charged per quintal
* <b>date</b>
* <b>district_name</b>
* <b>state_name</b>

In [4]:
# Get current working directory
def get_cwd():
    return os.getcwd()

# Read the dataset
def read_dataset(filename):
    data = os.path.join(get_cwd() + '\\data' , filename)
    return (pd.read_csv(data))

In [5]:
# Look at the 'cmo_monthly' dataset
cmo_monthly = read_dataset("Monthly_data_cmo.csv")
cmo_monthly.head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name,state_name
0,Ahmednagar,Bajri,2015,April,79,1406,1538,1463,2015-04,Ahmadnagar,Maharashtra
1,Ahmednagar,Bajri,2016,April,106,1788,1925,1875,2016-04,Ahmadnagar,Maharashtra
2,Ahmednagar,Wheat(Husked),2015,April,1253,1572,1890,1731,2015-04,Ahmadnagar,Maharashtra
3,Ahmednagar,Wheat(Husked),2016,April,387,1750,2220,1999,2016-04,Ahmadnagar,Maharashtra
4,Ahmednagar,Sorgum(Jawar),2015,April,3825,1600,2200,1900,2015-04,Ahmadnagar,Maharashtra


As, we analysied in our previous notebook <b>Monthly_data_cmo</b> version 1. that <b>state_name</b> attribute shows only <b>Maharashtra</b> as it's unique value for all the entities. So we can simply remove it from our furthur processing dataset.

#### 3. EDA

##### 3.1 state_name

In [6]:
# Remove state_name attribute
state_name = cmo_monthly.pop('state_name')
cmo_monthly.head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name
0,Ahmednagar,Bajri,2015,April,79,1406,1538,1463,2015-04,Ahmadnagar
1,Ahmednagar,Bajri,2016,April,106,1788,1925,1875,2016-04,Ahmadnagar
2,Ahmednagar,Wheat(Husked),2015,April,1253,1572,1890,1731,2015-04,Ahmadnagar
3,Ahmednagar,Wheat(Husked),2016,April,387,1750,2220,1999,2016-04,Ahmadnagar
4,Ahmednagar,Sorgum(Jawar),2015,April,3825,1600,2200,1900,2015-04,Ahmadnagar


##### 3.2 district_name

Now as we know from our past analysis that <b>district_name</b> is having total <b>33 unique</b> categories. And among them the <b>Top 5 categories that appears in most number of entities are:</b>

* <b>Pune: </b>appeared in 6366 number of entities.
* <b>Ahmadnagar: </b>appeared in 4638 number of entities.
* <b>Nagpur: </b>appeared in 4527 number of entities.
* <b>Solapur: </b>appeared in 4524 number of entities.
* <b>Nasik: </b>appeared in 3620 number of entities.

Let's list down the total 33 unique categories:

In [7]:
# List down each of them with number of occurances of different categories
cmo_monthly['district_name'].value_counts().sort_values(ascending = False)

Pune          6366
Ahmadnagar    4638
Nagpur        4527
Solapur       4524
Nasik         3620
Satara        2771
Buldhana      2669
Amaravathi    2590
Jalgaon       2579
Aurangabad    2312
Beed          1916
Thane         1854
Osmanabad     1737
Kolhapur      1704
Mumbai        1656
Jalna         1489
Yewatmal      1455
Latur         1446
Dhule         1328
Parbhani      1312
Chandrapur    1215
Sangli        1131
Akola         1084
Nanded        1042
Nandurbar     1001
Wasim          964
Hingoli        776
Wardha         630
Raigad         610
Bhandara       540
Ratnagiri      457
Gadchiroli     360
Gondiya        126
Name: district_name, dtype: int64

##### 3.3 date

Now, let's convert our <b>date</b> attribute into proper <b>YYYY-MM-DD</b> format.

In [8]:
# Change date attribute to date time
cmo_monthly['date'] = pd.to_datetime(cmo_monthly.date)
cmo_monthly.head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name
0,Ahmednagar,Bajri,2015,April,79,1406,1538,1463,2015-04-01,Ahmadnagar
1,Ahmednagar,Bajri,2016,April,106,1788,1925,1875,2016-04-01,Ahmadnagar
2,Ahmednagar,Wheat(Husked),2015,April,1253,1572,1890,1731,2015-04-01,Ahmadnagar
3,Ahmednagar,Wheat(Husked),2016,April,387,1750,2220,1999,2016-04-01,Ahmadnagar
4,Ahmednagar,Sorgum(Jawar),2015,April,3825,1600,2200,1900,2015-04-01,Ahmadnagar


Now we can see that the format of <b>date</b> attribute has been changed.

##### 3.4 modal_price

Now, let's move to <b>modal_price: Mode (Average) price charged per quintal</b> attribute and analyse how the data is distributed in our dataset.

In [9]:
# Check out modal_price
cmo_monthly['modal_price'].describe()

count    62429.00000
mean      3296.00399
std       3607.79253
min          0.00000
25%       1450.00000
50%       2425.00000
75%       4257.00000
max     142344.00000
Name: modal_price, dtype: float64

As we can see that the <b>standard deviation value is much higher that the mean value</b>. This shows that our data is widely spread. The <b>minimum value of the modal_price</b> is given 0. Let's check it out.

In [10]:
len(cmo_monthly[cmo_monthly['modal_price'] == 0])

204

The above output shows that the <b>modal_price is equal to 0 for total 204 entities.</b>

In [11]:
# Let's check few of the samples
cmo_monthly[cmo_monthly['modal_price'] == 0].head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name
34598,Chandvad,Cucumber,2016,January,1,0,2000,0,2016-01-01,Nasik
38760,Pathari,Gram,2015,August,2,3651,4101,0,2015-08-01,Parbhani
38843,Pathari,Bajri,2014,December,23,1401,1539,0,2014-12-01,Parbhani
38849,Pathari,Green Gram,2014,December,7,5601,6600,0,2014-12-01,Parbhani
39034,Pathari,Bajri,2015,January,10,1493,1588,0,2015-01-01,Parbhani


We know that 'modal_price' attribute shows the average price charged per quintal. And <b>it can't be zero as price never so in negative as well as it could not be possible that our sample is having both 'min_price' and 'max_price' equals to zero.</b>

Hence, we will simply <b>remove the samples/entities from the dataset which shows 'modal_price' value equals to zero.</b> In this way we will only keep trustful data in our dataset.

In [12]:
# remove entities from the dataset where modal_price = 0
cmo_monthly.drop(cmo_monthly[cmo_monthly.modal_price == 0].index, inplace = True)

Successfully removed. Let's check it out.

In [13]:
len(cmo_monthly[cmo_monthly['modal_price'] == 0])

0

Now, it's verified that the samples having 'modal_price' equals to zero have been removed from the dataset.

##### 3.5 max_price and min_price

Let's move towards <b>max_price and min_price</b> attribute.

With these attribute we can checkout the <b>difference in max_price and min_price</b>. The <b>max_price</b> attribute shows the <b>Maximum price per quintal</b> and the <b>min_price</b> attribute shows the <b>Minimum price per quintal</b>.

We know that it's not possible that the max_price is less than the min_price of any entity. But as we analysed previously it is happening in our dataset. So let's check out where it is happening and why?

In [14]:
# Calculating the difference
def diff_price(max_price, min_price):
    return (max_price - min_price)

Let's create an attribute that contains the difference in price.

In [15]:
cmo_monthly['diff_price'] = diff_price(cmo_monthly.max_price, cmo_monthly.min_price)
cmo_monthly.head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name,diff_price
0,Ahmednagar,Bajri,2015,April,79,1406,1538,1463,2015-04-01,Ahmadnagar,132
1,Ahmednagar,Bajri,2016,April,106,1788,1925,1875,2016-04-01,Ahmadnagar,137
2,Ahmednagar,Wheat(Husked),2015,April,1253,1572,1890,1731,2015-04-01,Ahmadnagar,318
3,Ahmednagar,Wheat(Husked),2016,April,387,1750,2220,1999,2016-04-01,Ahmadnagar,470
4,Ahmednagar,Sorgum(Jawar),2015,April,3825,1600,2200,1900,2015-04-01,Ahmadnagar,600


In [17]:
# check out diff_price
cmo_monthly['diff_price'].describe()

count      62225.00000
mean         741.15839
std        14473.78495
min     -3149973.00000
25%          148.00000
50%          384.00000
75%          867.00000
max      1596090.00000
Name: diff_price, dtype: float64

As we can see from our above output that the standard deviation is having much higher value than the mean. This shows our that our data is widely distributed.

We can also see that the minimum value is negative, which can't be possible. Let's check out all the entities which are having the values less than zero.

In [18]:
# diff_price less than zero
len(cmo_monthly[cmo_monthly['diff_price'] < 0])

269

The above output shows that there are total <b>296 entities in our dataset which are biased based on diff_price attribute</b>. As in 296 entities the max_price is less than min_price (diff_price < 0), which can not possible.

In [20]:
# check few of them
cmo_monthly[cmo_monthly['diff_price'] < 0].head()

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name,diff_price
384,Ahmednagar,Coriander,2015,August,4230,5,1,7,2015-08-01,Ahmadnagar,-4
1153,Pathardi,Cotton,2015,February,2300,6805,3873,3798,2015-02-01,Ahmadnagar,-2932
2237,Pathardi,Bajri,2015,June,413,2600,1446,1325,2015-06-01,Ahmadnagar,-1154
2564,Pathardi,Pigeon Pea (Tur),2015,March,16,5000,3025,5100,2015-03-01,Ahmadnagar,-1975
3175,Ahmednagar,Garlic,2015,November,53,14500,8813,7156,2015-11-01,Ahmadnagar,-5687


So let's remove all the entities from our dataset which are having <b>diff_price < 0</b>.

In [21]:
# remove entities from the dataset where diff_price < 0
cmo_monthly.drop(cmo_monthly[cmo_monthly.diff_price < 0].index, inplace = True)

Now, we successfully removed all the entities from our dataset where diff_price < 0.

In [22]:
# recheck
len(cmo_monthly[cmo_monthly['diff_price'] < 0])

0

It's verified that all the entities have been removed.

Now, let's check out the entities where the diff_price is equal to zero i.e. where max_price is similar to the min_price.

In [23]:
# diff_price equals to zero
len(cmo_monthly[cmo_monthly['diff_price'] == 0])

5747

There are <b>5747 entities in our dataset where diff_price equals to zero</b>. Or we can say that minimum price and maximum price is same.

<b>We can also say that there are total 5747 entities in our dataset where 'min_price', 'max_price', and 'modal_price' remain same. It means there is no flectuation seen in price in these entities.</b>

We will know more about these attributes very soon. But now move towards our next attribute.

##### 3.6 arrival_in_qtl

In [24]:
# Check out quantity arrival data
cmo_monthly['arrivals_in_qtl'].describe()

count     61956.00000
mean       6073.25107
std       34786.45736
min           1.00000
25%          38.00000
50%         213.00000
75%        1372.25000
max     1450254.00000
Name: arrivals_in_qtl, dtype: float64

The above output shows us that the standard deviation value is much larger than the mean value, which means that out data is widely spread.

We can also check out the entities where our value is minimum and maximum.

In [26]:
# arrival_in_qtl is minimum
len(cmo_monthly[cmo_monthly['arrivals_in_qtl'] == 1])

1432

There are total <b>1432 entities in our dataset where 'arrival_in_qtl' values is equal to 1</b>.

In [27]:
# arrival_in_qtl is maximum
cmo_monthly[cmo_monthly['arrivals_in_qtl'] == 1450254]

Unnamed: 0,APMC,Commodity,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name,diff_price
44313,Manchar,Methi (Bhaji),2015,November,1450254,228,1019,778,2015-11-01,Pune,791


There is only <b>1 entity in our dataset where 'arrivals_in_qtl' is maximum</b>.

We will know more about them but firstly let's check our other attributes.

##### 3.7 Month

In [28]:
# Check out unique months
cmo_monthly['Month'].value_counts().sort_values(ascending = False)

November     7143
October      6824
September    6458
June         4865
January      4832
December     4781
February     4585
May          4559
August       4534
March        4495
July         4479
April        4401
Name: Month, dtype: int64

<b>The dataset contains all the 12 months. And most of out entities belong to November and October month (Autumn season), and least of our entities belong to April and July month (Spring and Summer season).

##### 3.8 Year

In [29]:
# check out unique years
cmo_monthly['Year'].value_counts().sort_values(ascending = False)

2016    28915
2015    25271
2014     7770
Name: Year, dtype: int64

<b>Most of our data belong to 2016 year.</b>

##### 3.9 Commodity

Now comes the main part. Let's know more about the different commodities we are having in our dataset. As from our past data analysis notebook we know that our dataset contains <b>352 unique commodities</b>. But some of the commodity names are in all caps.

Let's convert all the commodity names in small caps and then recheck how many unique commodities we are having in our dataset.

In [30]:
# convert all the commodities name to lowercase
cmo_monthly['Commodity'] = cmo_monthly['Commodity'].str.lower()

In [31]:
# check unique commodities
uniq_com = cmo_monthly['Commodity'].unique()
uniq_com

array(['bajri', 'wheat(husked)', 'sorgum(jawar)', 'maize', 'gram',
       'horse gram', 'matki', 'pigeon pea (tur)', 'black gram',
       'castor seed', 'soybean', 'jaggery', 'lemon', 'ginger (fresh)',
       'potato', 'ladies finger', 'flower', 'carrot', 'cluster bean',
       'ghevda', 'ghosali(bhaji)', 'mango(raw)', 'cucumber', 'onion',
       'bitter gourd', 'cabbage', 'garlic', 'math (bhaji)', 'capsicum',
       'tomato', 'brinjal', 'tamarind', 'tamarind seed',
       'coriander (dry)', 'green chilli', 'chillies(red)', 'mustard',
       'paddy-unhusked', 'hilda', 'chikoo', 'cotton',
       'ground nut pods (dry)', 'pomegranate', 'papai', 'melon',
       'beet root', 'bottle gourd', 'dhemse', 'coriander ', 'coriander  ',
       'spinach', 'shevga', 'small gourd', 'grapes', 'kharbuj',
       'green gram', 'sunflower', 'safflower', 'mango', 'water melon',
       'mosambi', 'orange', 'fenugreek', 'cowpea', 'green peas (dry)',
       'squash gourd', 'maize (corn.)', 'chino', 'curry lea

In [32]:
# Convert two categories in proper format so that they will not create bug furthur.
d = {'bhagar/vari' : 'bhagar-vari', 'thymol/lovage': 'thymol-lovage'}
cmo_monthly['Commodity'] = cmo_monthly['Commodity'].replace(d)

In [33]:
uniq_com = cmo_monthly['Commodity'].unique()
len(uniq_com)

204

<b>It's look that now we are having a lot lesser unique commodity names i.e. only 204 than previous i.e. 352</b>.

In [34]:
# Count the top five commodities
cmo_monthly['Commodity'].value_counts().sort_values(ascending = False).head()

gram                4092
wheat(husked)       4060
soybean             3694
sorgum(jawar)       3689
pigeon pea (tur)    3453
Name: Commodity, dtype: int64

<b>The top five commodity names that appeared in most of the entities are the same.</b>

##### 3.10 APMC

We previously analysied that we are having total <b>349 unique APMC categories</b>. And the top five APMC categories that appeared in most of the entities are:

In [35]:
# Count the top five APMC
cmo_monthly['APMC'].value_counts().sort_values(ascending = False).head()

Mumbai     1536
Pune       1512
Nagpur     1340
Barshi     1076
Jalgaon    1055
Name: APMC, dtype: int64

<b>We have seen each attribute personally. Let's dig deeper and group our data based of different APMC and Commodity attribute. And then do more analysis.</b>

##### 4. Analysis of each APMC and Commodity data

In [36]:
# Set APMC and Commodity attributes as the index
cmo_monthly.set_index(['APMC', 'Commodity'], inplace = True)
cmo_monthly.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,Month,arrivals_in_qtl,min_price,max_price,modal_price,date,district_name,diff_price
APMC,Commodity,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Ahmednagar,bajri,2015,April,79,1406,1538,1463,2015-04-01,Ahmadnagar,132
Ahmednagar,bajri,2016,April,106,1788,1925,1875,2016-04-01,Ahmadnagar,137
Ahmednagar,wheat(husked),2015,April,1253,1572,1890,1731,2015-04-01,Ahmadnagar,318
Ahmednagar,wheat(husked),2016,April,387,1750,2220,1999,2016-04-01,Ahmadnagar,470
Ahmednagar,sorgum(jawar),2015,April,3825,1600,2200,1900,2015-04-01,Ahmadnagar,600


Now, our data is grouped according to different APMC and Commodity attribute.

As we know that we are having total 349 unique APMC names and 204 unique commodity names. And we need to find out the trend or seasonality in each of the different APMC and Commodity pair. So let's create different csv files for each APMC and commodity with having date attribute in the sorted format. In this way it would be easy for us to analyse each pair easily.

In [37]:
# list down each APMC
uniq_APMC = ['Ahmednagar', 'Akole', 'Jamkhed', 'Kopargaon', 'Newasa',
       'Newasa-Ghodegaon', 'Parner', 'Pathardi', 'Rahata', 'Rahuri',
       'Rahuri-Vambori', 'Sangamner', 'Shevgaon', 'Shevgaon-Bodhegaon',
       'Shrirampur', 'Shrirampur-Belapur', 'Shrigonda',
       'Shrigonda-Ghogargaon', 'Karjat (A- Nagar)', 'Rahuri-Songaon',
       'Akola', 'Akot', 'Balapur', 'Murtizapur', 'Patur', 'Telhara',
       'Barshi Takli', 'Achalpur', 'Amarawati',
       'Amarawati-Fruit And Vegetables', 'Anajngaon Surji',
       'Chandur Bajar', 'Chandur Rly.', 'Daryapur', 'Dhamangaon-Rly',
       'Dharni', 'Morshi', 'Nandgaon Khandeshwar', 'Varud',
       'Varud-Rajura Bazar', 'Tiwasa', 'Aurangabad', 'Fulambri',
       'Gangapur', 'Kannad', 'Lasur Station', 'Paithan', 'Sillod',
       'Sillod-Bharadi', 'Soygaon', 'Vaijapur', 'Khultabad', 'Ambejogai',
       'Beed', 'Gevrai', 'Kada', 'Kada (Ashti)', 'Kej', 'Kille Dharur',
       'Majalgaon', 'Parli-Vaijnath', 'Bhandara', 'Lakhandur', 'Lakhani',
       'Pavani', 'Tumsar', 'Buldhana', 'Buldhana-Dhad', 'Chikhali',
       'Deulgaon Raja', 'Jalgaon Jamod', 'Jalgaon Jamod-Aasalgaon',
       'Khamgaon', 'Lonar', 'Malkapur', 'Mehkar', 'Nandura', 'Sangrampur',
       'Sangrampur-Varvatbakal', 'Shegaon', 'Sindkhed Raja', 'Brahmpuri',
       'Chandrapur', 'Chandrapur-Ganjwad', 'Chimur', 'Gondpimpri',
       'Korpana', 'Mul', 'Nagbhid', 'Pombhurni', 'Rajura', 'Savali',
       'Sindevahi', 'Varora', 'Bhadrawati', 'Dhule', 'Dondaicha',
       'Dondaicha-Sindkheda', 'Sakri', 'Shirpur', 'Aheri', 'Armori',
       'Armori-Desaiganj', 'Chamorshi', 'Gadchiroli', 'Sironcha',
       'Aamgaon', 'Arjuni Morgaon', 'Gondiya', 'Sadak Arjuni', 'Tiroda',
       'Goregaon', 'Devri', 'Akhadabalapur', 'Basmat', 'Hingoli',
       'Jawala-Bajar', 'Kalamnuri', 'Sengaon', 'Basmat (Kurunda)',
       'Hingoli-Kanegaon Naka', 'Amalner', 'Bhusaval', 'Bodwad-Varangaon',
       'Chalisgaon', 'Chopda', 'Dharangaon', 'Jalgaon', 'Jalgaon-Masawat',
       'Jamner', 'Jamner-Neri', 'Parola', 'Raver', 'Yawal', 'Pachora',
       'Raver-Sawada', 'Ambad(Vadi Godri)', 'Bhokardan',
       'Bhokardan-Pimpalgaon Renu', 'Ghansawangi', 'Jafrabad', 'Jalna',
       'Jalna-Badnapur', 'Mantha', 'Partur', 'Ashti (Jalna)',
       'Gadhinglaj', 'Kolhapur', 'Kolhapur-Laxmipuri',
       'Kolhapur-Malkapur', 'Vadgaon Peth', 'Ahmedpur', 'Aurad Shahajani',
       'Ausa', 'Chakur', 'Devani', 'Jalkot', 'Latur', 'Latur-Murud',
       'Udgir', 'Nilanga', 'Mumbai', 'Mumbai-Fruit Market',
       'Mumbai-Onion And Potato Mkt', 'Bhiwapur', 'Hingna', 'Kalmeshwar',
       'Kamthi', 'Katol', 'Mandhal', 'Nagpur', 'Narkhed', 'Parshiwani',
       'Ramtek', 'Savner', 'Umared', 'Mauda', 'Bhokar', 'Hadgaon',
       'Hadgaon-Tamsa', 'Kinwat', 'Loha', 'Mahur', 'Nanded', 'Umari',
       'Deglur', 'Naigaon', 'Dharmabad', 'Hanegaon', 'Himayatnagar',
       'Kundalwadi', 'Mudkhed', 'Mukhed', 'Biloli', 'Kuntur', 'Akkalkuwa',
       'Nandurbar', 'Navapur', 'Shahada', 'Taloda', 'Dhadgaon',
       'Chandvad', 'Devala', 'Dindori', 'Dindori-Vani', 'Ghoti', 'Kalvan',
       'Lasalgaon', 'Lasalgaon-Niphad', 'Lasalgaon-Vinchur', 'Malegaon',
       'Manmad', 'Nampur', 'Nashik', 'Nashik-Devlali',
       'Pimpalgaon (B)-Saykheda', 'Pimpalgaon Basawant', 'Satana',
       'Sinner', 'Umrane', 'Yeola', 'Nandgaon', 'Kalamb (Os)', 'Murum',
       'Osmanabad', 'Paranda', 'Tuljapur', 'Umarga', 'Lohara',
       'Washi(Osmanabad)', 'Bhoom', 'Gangakhed', 'Jintur', 'Jintur-Bori',
       'Palam', 'Parbhani', 'Pathari', 'Purna', 'Sailu', 'Sonpeth',
       'Manwat', 'Bori', 'Tadkalas', 'Baramati', 'Dound', 'Indapur',
       'Indapur (Nimgaon Ketki)', 'Indapur-Bhigwan', 'Junnar',
       'Junnar (Alephata)', 'Junnar (Narayangaon)', 'Junnar-Otur',
       'Khed (Shel Pimpalgaon)', 'Khed-Chakan', 'Manchar', 'Nira',
       'Nira-Saswad', 'Pune', 'Pune-Manjri', 'Pune-Pimpri', 'Shirur',
       'Junnar (Bhlhe)', 'Khed', 'Pune-Moshi', 'Bhor', 'Talegaon Dabhade',
       'Alibag', 'Karjat (Raigad)', 'Mangaon(Bhadav)', 'Murud', 'Panvel',
       'Pen', 'Roha', 'Khalapur(Shil-Phata)', 'Mahad', 'Ratanagari',
       'Atpadi', 'Islampur', 'Sangali', 'Sangli-Miraj',
       'Sangli-Phale Bhajipalam', 'Tasgaon', 'Palus', 'Vita', 'Shirala',
       'Karad', 'Koregaon', 'Lonand', 'Patan', 'Phaltan', 'Satara',
       'Vaduj', 'Vai', 'Akkolkot', 'Akluj', 'Barshi', 'Barshi-Vairag',
       'Dudhani', 'Karmala', 'Kurdwadi', 'Kurdwadi-Modnimb',
       'Mangalwedha', 'Mohol', 'Pandharpur', 'Sangola', 'Solapur',
       'Bhivandi', 'Kalyan', 'Murbad', 'Palghar(Bevur)', 'Shahapur',
       'Ulhasnagar', 'Vasai', 'Kalyan (Cattle Market)', 'Arvi',
       'Ashti (Wardha)', 'Ashti-Karanja', 'Hinganghat', 'Pulgaon',
       'Samudrapur', 'Sindi', 'Sindi (Selu)', 'Wardha', 'Karanja',
       'Malegaon (Washim)', 'Mangrulpeer', 'Manora', 'Risod', 'Washim',
       'Washim-Ansing', 'Aarni', 'Babhulgaon', 'Digras', 'Ghatanji',
       'Kalamb (Yawatmal)', 'Mahagaon', 'Ner Parasopant', 'Pandharkawada',
       'Pusad', 'Ralegaon', 'Umarkhed', 'Umarkhed-Danki', 'Vani',
       'Yeotmal', 'Zarijamini', 'Maregaon', 'Bori Arab', 'Darwha',
       'Vadvani']

In [38]:
os.chdir(os.getcwd() + '\\apmc_commodity')

In [39]:
cmo_monthly = cmo_monthly.sort_values(by = ['date'])

In [40]:
# write the function
def apmc_commodity():
    
    for apmc in uniq_APMC:
        apmc_name = cmo_monthly.loc[apmc]
        apmc_comms = apmc_name.index.unique().tolist()
        for comm in apmc_comms:
            filename = apmc + '_' + comm + '.csv'
            comms = apmc_name.loc[comm]
            try:
                comms.to_csv(filename, sep = ',', encoding = 'utf-8')
            except Exception as e:
                print(e)

In [41]:
apmc_commodity()

<b>That's it, now we have successfully created different csv files of different APMC's and commodities, stored inside the apmc_commodity folder with their respective APMC_Commodity name.</b>

Good, as now it's easy for us to visualize them seperatly. <b>Total files created are 4,830</b>.

<b>Analysis of each APMC and Commodity has been shown in different notebook (APMC_Commodity Analysis notebook).</b>