Goal: To plot the the information about the winners of 2019 Lok Sabha elections on the map.
Requirements:
- List of consitutuency, canditates and related information. This will be scraped from myneta.info website.
- Coordinates of the constituency to locate on the map.
- Create a plot using pydeck library.

### Step 1: scrape the table from site

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://myneta.info/LokSabha2019/index.php?action=summary&subAction=winner_analyzed&sort=candidate#summary'

r = requests.get(url)
html = r.text

soup = BeautifulSoup(html)
table = soup.find('table', {"class": "w3-table w3-bordered"})
rows = table.find_all('tr')
data = []
for row in rows[1:]:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

df = pd.DataFrame(data, columns=['Sno', 'Candidate', 'Constituency', 'Party', 'Criminal Case', 'Education', 'Total Assets', 'Liabilities'])
df.head(5)


Unnamed: 0,Sno,Candidate,Constituency,Party,Criminal Case,Education,Total Assets,Liabilities
0,1,A M Ariff,ALAPPUZHA,CPI(M),2,Graduate Professional,"Rs 1,52,68,906 ~ 1 Crore+","Rs 22,20,700 ~ 22 Lacs+"
1,2,A Narayanaswamy,CHITRADURGA,BJP,0,Graduate,"Rs 9,61,97,642 ~ 9 Crore+",Rs 0 ~
2,3,A. Raja,NILGIRIS,DMK,6,Graduate Professional,"Rs 4,95,91,024 ~ 4 Crore+","Rs 14,24,914 ~ 14 Lacs+"
3,4,Abdul Khaleque,BARPETA,INC,0,Post Graduate,"Rs 73,98,753 ~ 73 Lacs+","Rs 27,03,693 ~ 27 Lacs+"
4,5,Abhishek Banerjee,DIAMOND HARBOUR,AITC,0,Graduate,"Rs 1,37,94,320 ~ 1 Crore+",Rs 0 ~


### Save the table to a csv file

In [2]:
df.to_csv(path_or_buf='ls2019_winner_data.csv', index=False)

### Step 2: Find the coordinates. The coordinates could be extracted using geopy library which allows to use geolocating APIs in a simple way.

Setting up the geolcator service and testing it on one of the consittuency

In [3]:
from functools import partial
from geopy.geocoders import Nominatim, Bing
import os
# geolocator = Nominatim(user_agent="map_play")
geolocator = Bing(api_key=os.environ.get('BING_API_KEY'))


geocode = partial(geolocator.geocode, language="en")
location = geolocator.geocode("ALAPPUZHA")
print(location.address)
print((location.latitude, location.longitude))


Alappuzha, KL, India
(9.48480034, 76.32230377)


Get the coordinates using API for each constituency. This will take sometime(around 30 minutes for all the constituency)

In [4]:
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1, max_retries=5)
df['location'] = df['Constituency'].apply(geocode)
df['address'] = df['location'].apply(lambda x: x.address if x else x)
df['latitude'] = df['location'].apply(geocode).apply(lambda x: x.latitude if x else x)
df['longitude'] = df['location'].apply(geocode).apply(lambda x: x.longitude if x else x)

RateLimiter caught an error, retrying (0/5 tries). Called with (*(Location(Surat, GJ, India, (21.20350838, 72.83922577, 0.0)),), **{}).
Traceback (most recent call last):
  File "/home/bhupender/anaconda3/envs/map_play/lib/python3.11/site-packages/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bhupender/anaconda3/envs/map_play/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/home/bhupender/anaconda3/envs/map_play/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
OSError: [Errno 101] Network is unreachable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/bhupender/anaconda3/envs/map_play/lib/python3.11/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
     

Check the location extracted by the API.

In [5]:
df.head()

Unnamed: 0,Sno,Candidate,Constituency,Party,Criminal Case,Education,Total Assets,Liabilities,location,address,latitude,longitude
0,1,A M Ariff,ALAPPUZHA,CPI(M),2,Graduate Professional,"Rs 1,52,68,906 ~ 1 Crore+","Rs 22,20,700 ~ 22 Lacs+","(Alappuzha, KL, India, (9.48480034, 76.32230377))","Alappuzha, KL, India",9.4848,76.322304
1,2,A Narayanaswamy,CHITRADURGA,BJP,0,Graduate,"Rs 9,61,97,642 ~ 9 Crore+",Rs 0 ~,"(Chitradurga, India, (14.22999954, 76.40000153))","Chitradurga, India",14.23,76.400002
2,3,A. Raja,NILGIRIS,DMK,6,Graduate Professional,"Rs 4,95,91,024 ~ 4 Crore+","Rs 14,24,914 ~ 14 Lacs+","(TN, India, (11.45571995, 76.64025116))","TN, India",36.330116,-88.261971
3,4,Abdul Khaleque,BARPETA,INC,0,Post Graduate,"Rs 73,98,753 ~ 73 Lacs+","Rs 27,03,693 ~ 27 Lacs+","(Barpeta, AS, India, (26.3295002, 91.00610352))","Barpeta, AS, India",26.3295,91.006104
4,5,Abhishek Banerjee,DIAMOND HARBOUR,AITC,0,Graduate,"Rs 1,37,94,320 ~ 1 Crore+",Rs 0 ~,"(Diamond Harbour, WB, India, (22.19249916, 88....","Diamond Harbour, WB, India",22.192499,88.189499


Check for missing values

In [8]:
df.isna().sum()

Sno              0
Candidate        0
Constituency     0
Party            0
Criminal Case    0
Education        0
Total Assets     0
Liabilities      0
location         4
address          4
latitude         2
longitude        2
dtype: int64

In [10]:
df.loc[df['location'].isna()]

Unnamed: 0,Sno,Candidate,Constituency,Party,Criminal Case,Education,Total Assets,Liabilities,location,address,latitude,longitude
29,30,Annasaheb Shankar Jolle,CHIKKODI,BJP,0,12th Pass,"Rs 34,49,22,831 ~ 34 Crore+","Rs 19,55,95,693 ~ 19 Crore+",,,44.934242,7.541259
79,80,Brij Bhushan Sharan Singh,KAISERGANJ,BJP,4,Graduate Professional,"Rs 9,89,05,402 ~ 9 Crore+","Rs 6,15,24,736 ~ 6 Crore+",,,44.934242,7.541259
215,216,John Barla,ALIPURDUARS,BJP,9,8th Pass,"Rs 14,18,730 ~ 14 Lacs+",Rs 0 ~,,,44.934242,7.541259
345,346,Pradyut Bordoloi,NAWGONG,INC,0,Post Graduate,"Rs 7,41,43,272 ~ 7 Crore+","Rs 40,11,152 ~ 40 Lacs+",,,44.934242,7.541259


In [11]:
df.loc[df['latitude'].isna()]

Unnamed: 0,Sno,Candidate,Constituency,Party,Criminal Case,Education,Total Assets,Liabilities,location,address,latitude,longitude
243,244,Kiren Rijiju,ARUNACHAL WEST,BJP,0,Graduate Professional,"Rs 1,52,79,000 ~ 1 Crore+",Rs 0 ~,"(AR, India, (27.59338951, 96.1073761))","AR, India",,
504,505,Tapir Gao,ARUNACHAL EAST,BJP,0,Post Graduate,"Rs 13,66,28,259 ~ 13 Crore+",Rs 0 ~,"(AR, India, (27.59338951, 96.1073761))","AR, India",,


The above locations and latitude, langitude were inaccurate or varied from the expected because of the spelling difference between the constituency name and the location received from API for the given spelling. Besides that one needs to check all the extracted locations and coordinates for the same reason. Hence a more robust way is needed to extract the coordinates from the constituency names. 
As wikipedia presents the information about every constituency alongwith the coordinates, wikipedia pages are searched and scraped for the information in the wiki_scrape jupyter notebook.