### Match district names in `rice_yield.csv` to district names in shapefile

Let us first load a shapefile of Indian districts in 2020. As we will see, district names in the shape file are marginally different from those in our agriculture data set `rice_yield.csv`. We will manually correct such discrepancies to ease our life going forward.

In [1]:
# Load relevant packages.
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

In [2]:
# Load shapefile of Indian districts.
shp = gpd.read_file('../Raw_data/India_districts2020.shp')

In [3]:
shp.head()

Unnamed: 0,objectid,statecode,statename,state_ut,distcode,distname,distarea,totalpopul,totalhh,totpopmale,totpopfema,st_areasha,st_lengths,geometry
0,1,5,Uttarakhand,STATE,66,Nainital,4251.0,954605.0,191383.0,493666.0,460939.0,5322546000.0,506182.695952,"POLYGON ((79.52659 29.05543, 79.52550 29.05545..."
1,2,5,Uttarakhand,STATE,60,Dehradun,3088.0,1696694.0,347001.0,892199.0,804495.0,4177236000.0,578188.681639,"POLYGON ((77.87557 30.26052, 77.87467 30.26087..."
2,3,5,Uttarakhand,STATE,64,Almora,3144.0,622506.0,140577.0,291081.0,331425.0,4140751000.0,463454.225766,"POLYGON ((79.28494 29.92735, 79.28495 29.92723..."
3,4,5,Uttarakhand,STATE,65,Champawat,1766.0,259648.0,53953.0,131125.0,128523.0,2294297000.0,314508.010612,"POLYGON ((80.12479 29.01308, 80.12481 29.01306..."
4,5,5,Uttarakhand,STATE,56,Uttarkashi,8016.0,330086.0,67602.0,168597.0,161489.0,10851660000.0,786425.588972,"POLYGON ((78.92267 31.25333, 78.93106 31.26840..."


In [4]:
# Districts in shp
shp_distname = np.unique(shp['distname'])
print('No. of districts in shapefile = %d'% (len(shp_distname)))

No. of districts in shapefile = 686


In [5]:
# Load .csv file of agriculture data.
df = pd.read_csv('../Final_data/rice_yield.csv')

# Districts in df
df_distname = np.unique(df['Dist Name'])
print('No. of districts in .csv file = %d'% (len(df_distname)))

No. of districts in .csv file = 506


In [6]:
df_distname[:10]

array(['Adilabad', 'Agra', 'Ahmedabad', 'Ahmednagar', 'Ajmer', 'Akola',
       'Alappuzha', 'Aligarh', 'Alirajpur', 'Allahabad'], dtype=object)

Let's look for district names in our `.csv` file that lack a case-insensitive counterpart in the shapefile.

In [7]:
missing_districts = []

for dist in df_distname:
    if dist not in shp_distname:
        missing_districts.append(dist)
print('Total no. of district names apparently missing in shapefile = %d'% (len(missing_districts)))        

Total no. of district names apparently missing in shapefile = 126


Are these 126 districts really not in the shapefile? For example, let's check for `Ahmedabad` in the shapefile.

In [8]:
shp_distname[:10]

array(['Adilabad', 'Agra', 'Ahmadabad', 'Ahmadnagar', 'Aizawl', 'Ajmer',
       'Akola', 'Alappuzha', 'Aligarh', 'Alirajpur'], dtype=object)

It turns out that our shapefile contains a district `Ahmadabad`, which is just an alternate spelling of Ahmedabad. Let's construct a dictionary matching district names in the shapefile to those in our `.csv` file.

In [9]:
# Dictionary matching shapefile district to .csv district
dict_matches = {'Ahmadabad': 'Ahmedabad',
                'Ahmadnagar': 'Ahmednagar',
                'Almora': 'Almorah',
                'Amravati': 'Amarawati',
                'Amethi': 'Amethi C.S.M.Nagar',
                'Jyotiba Phule Nagar': 'Amroha J.B.Fulenagar',
                'Anugul': 'Angul',
                'Bagalkot': 'Bagalkote',
                'Baghpat': 'Bagpat',
                'Baleshwar': 'Balasore',
                'Banas Kantha': 'Banaskantha',
                'Bangalore Rural': 'Bangalore(Rural)',
                'Bangalore': 'Bangalore(Urban)',
                'Bara Banki': 'Barabanki',
                'Bid': 'Beed',
                'Kaimur (Bhabua)': 'Bhabhua Kaimur',
                'Bathinda': 'Bhatinda',
                'Bilaspur': 'Bilashpur',
                'Balangir': 'Bolangir',
                'Baudh': 'Boudh',
                'Bulandshahr': 'Buland Shahar',
                'Buldana': 'Buldhana',
                'Chamarajanagar': 'Chamaraja Nagar',
                'Purba Champaran': 'Champaran(East)',
                'Pashchim Champaran': 'Champaran(West',
                'Champawat': 'Champavat',
                'Kancheepuram': 'Chengalpattu MGR Kancheepuram',
                'Chikmagalur': 'Chickmagalur',
                'Thoothukkudi': 'Chidambanar Toothukudi',
                'Chikkaballapura': 'Chikkaballapur',
                'Chittaurgarh': 'Chittorgarh',
                'Dohad': 'Dahod',
                'The Dangs': 'Dangs',
                'Dakshin Bastar Dantewada': 'Dantewara',
                'Deoghar': 'Deogarh',
                'Debagarh': 'Devghar Deogarh',
                'Dhaulpur': 'Dholpur',
                'Dindigul': 'Dindigul Anna',
                'Ernakulam': 'Eranakulam',
                'Firozpur': 'Ferozpur',
                'Gautam Buddha Nagar': 'G.B.Nagar',
                'Garhwa': 'Gadva Garhwa',
                'Gondiya': 'Gondia',
                'Hardwar': 'Haridwar',
                'Mahamaya Nagar': 'Hathras',
                'Hisar': 'Hissar',
                'Hydrabad': 'Hyderabad',
                'Janjgir-Champa': 'Janjgir',
                'Jhunjhunun': 'Jhunjhunu',
                'Kadapa(YSR)': 'Kadapa YSR',
                'Uttar Bastar Kanker': 'Kanker',
                'Kanniyakumari': 'Kanyakumari',
                'Karauli': 'Karoli',
                'Kanshiram Nagar': 'Kasganj Khansi Ram Nagar',
                'Kabeerdham': 'Kawardha',
                'Kendujhar': 'Keonjhar',
                'Khandwa (East Nimar)': 'Khandwa',
                'Khargone (West Nimar)': 'Khargone',
                'Kodarma': 'Khodrama Koderma',
                'Khordha': 'Khurda',
                'Kaushambi': 'Kushambi',
                'Kushinagar': 'Kushi Nagar Padrauna',
                'Lohardaga': 'Lohardagga',
                'Mahasamund': 'Mahasmund',
                'Mahrajganj': 'Mahrajgani',
                'Morigaon': 'Marigaon',
                'Mayurbhanj': 'Mayurbhanja',
                'Mahesana': 'Mehsana',
                'Mirzapur': 'Mirzpur',
                'Mumbai': 'Mumbai City',
                'Munger': 'Mungair',
                'Mungeli': 'Mungli',
                'Narsimhapur': 'Narsinghpur',
                'Nashik': 'Nasik',
                'Nabarangapur': 'Nawarangpur',
                'Vellore': 'North Arcot Vellore',
                'Dima Hasao': 'North Cachar Hil',
                'Pakur': 'Pakund Pakur',
                'Palamu': 'Palamau',
                'Panch Mahals': 'Panchmahal',
                'Perambalur': 'Perambular',
                'Erode': 'Periyar(Erode)',
                'Kandhamal': 'Phulbani(Kandhamal)',
                'Pithoragarh': 'Pithorgarh',
                'Purnia': 'Purnea',
                'Rae Bareli': 'Rae - Bareily',
                'Ramanagara': 'Ramanagaram',
                'Ramanathapuram': 'Ramananthapuram',
                'Ramgarh': 'Ramgadh',
                'Rupnagar': 'Roopnagar',
                'Sahibzada Ajit Singh Nagar': 'S.A.S Nagar',
                'Shahid Bhagat Singh Nagar': 'S.B.S Nagar',
                'Sri Potti Sriramulu Nellore': 'S.P.S.Nellore',
                'Sabar Kantha': 'Sabarkantha',
                'Sahibganj': 'Sahebganj',
                'Sant Kabir Nagar': 'Santh Kabir Nagar',
                'Sant Ravidas Nagar (Bhadohi)': 'Santh Ravi Das Nagar Bhadoi',
                'Dumka': 'Santhal Paragana Dumka',
                'Saraikela-Kharsawan': 'Sariakela Kharsawan',
                'Samli': 'Shamli',
                'Sheikhpura': 'Sheikapura',
                'Sheopur': 'Sheopur Kalan',
                'Shimoga': 'Shimoge',
                'Shrawasti': 'Shravasti',
                'Muktsar': 'Shri Mukatsar Sahib',
                'Sivasagar': 'Sibsagar',
                'Siddharthnagar': 'Sidharthnagar',
                'Purbi Singhbhum': 'Singhbhum East',
                'Pashchimi Singhbhum': 'Singhbhum West',
                'Sivaganga': 'Sivagangai Pasumpon',
                'Sonipat': 'Sonepat',
                'Subarnapur': 'Sonepur',
                'Cuddalore': 'South Arcot Cuddalore',
                'Sawai Madhopur': 'Swami Madhopur',
                'Tarn Taran': 'Taran Taran',
                'Tirunelveli': 'Thirunelveli',
                'Tiruppur': 'Thiruppur',
                'Tiruvannamalai': 'Thiruvannamalai',
                'Tiruchirappalli': 'Tiruchirapalli Trichy',
                'Thiruvarur': 'Tiruvarur',
                'Uttarkashi': 'Uttar Kashi',
                'Viluppuram': 'Villupuram',
                'Virudhunagar': 'Virudhunagar Kamarajar',
                'Warangal (R)': 'Warangal',
                'Yadgir': 'Yadagiri',
                'Yavatmal': 'Yeotmal'}

In [10]:
# Replace district names in shp.
for key in dict_matches.keys():
    shp.loc[shp['distname']==key, 'distname'] = dict_matches[key]
print('Replacement complete')

Replacement complete


In [11]:
# Check for any further discrepancies in district names between shp and df
anymore_disc = []
shp_distname = np.unique(shp['distname'])

for dist in df_distname:
    if dist not in shp_distname:
        anymore_disc.append(dist)
print('Total no. of district names apparently missing in shapefile = %d'% (len(anymore_disc)))

Total no. of district names apparently missing in shapefile = 0


That's a relief. Let's now write the updated shape file to disk.

In [12]:
shp.to_file('../Final_data/districts2020_updated.shp')