Folium maps are not visible on GitHub. To see a copy of this notebook on Google Colab please click the following link:

https://colab.research.google.com/drive/1Hv9iHNCBtfVvX0tz-3Yvy4yCZc9s79Wy

# Introduction #

India is the 5th largest economy in the world and India's automotive industry is the fourth largest in the world. The automotive industry in India accounts for 4% of the country's gross domestic product.

With the onset of emerging automotive technologies, there is enormous scope for Tier 1 automotive suppliers to place themselves in India. Before I describe the business problem I'll be solving, I would like for you to have an idea about the difference between a manufacturer and a tier 1 supplier with respect to the automotive industry.



*The following text has been taken from: https://medium.com/self-driving-cars/the-automotive-supply-chain-explained-d4e74250106f*

#### Manufacturers in the automotive industry

Automotive manufacturers are the brands that everyone knows — Ford and Toyota and BMW and their competitors.
These firms are commonly referred to OEMs (original equipment manufacturers), which is an unfortunate misnomer. While these manufacturers produce some original equipment, their real strength is in designing cars, marketing cars, ordering the parts from suppliers, and assembling the final product.

#### Tier 1 Suppliers in the automotive industry
Companies that supply parts or systems directly to OEMs are called Tier 1 suppliers. Some of these brands are recognizable, like Bosch or Continental. Some of them are less so. Tier 1 suppliers specialize in making “automotive-grade” hardware. This means hardware that withstands the motion, temperature, and longevity demands of OEMs.

**These suppliers usually work with a variety of vehicle companies.**

# Business Problem

The problem for this capstone project is to identify the optimal locations for setting up Tier 1 automotive supplier companies in India. We'll find the optimal locations for companies that supply components for all types of vehicles.

Depending upon the scope and financial strength of the tier 1 company, we'll answer three questions:

1. If a tier 1 company wants to cater products to all automotive manufacturing plants in India, where should it place itself?
2. If a tier 1 company wants to cater products to automotive manufacturing plants in a large part of India, where should it place itself?
3. If a tier 1 company wants to cater to the automotive manufacturing plants in a small part of India, where should it place itself?

# Data and Analysis#

I will be using the following data-sets for our analysis:

1. List of Industrial centers in India
https://en.wikipedia.org/wiki/List_of_industrial_centres_in_India

2. List of vehicle plants in India
https://en.wikipedia.org/wiki/List_of_vehicle_plants_in_India

3. After parsing, cleaning and plotting this data, I'll cluster the data using K-means clustering algorithm and visualize the clusters.

4. Then I'll define the optimal locations to place new Tier 1 supplier companies based on geographic data.

## Importing the required libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import folium
import numpy as np
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

## Parsing the wikipedia pages to get data in readble format
### 1. Getting the details of Industrial Areas in India

In [2]:
base_url = 'https://en.wikipedia.org/wiki/List_of_industrial_centres_in_India'
index = requests.get(base_url).text
soup=BeautifulSoup(index)

table = soup.find('table',{'class':'wikitable sortable'}).get_text()
table=table.split('\n')

df=pd.DataFrame(table)
df.rename(columns={0: 'text'},inplace=True)
df=df[df.text!='']
df=df.reset_index()
df.drop('index',axis=1,inplace=True)

s_no=[]
place = []
state = []
significance = []

for i in range(4,len(df)):
    if i%4==0:
        s_no.append(df.iloc[i][0])
    if i%4==1:
        place.append(df.iloc[i][0])
    if i%4==2:
        state.append(df.iloc[i][0])
    if i%4==3:
        significance.append(df.iloc[i][0])

industrial_centers = pd.DataFrame({'Place': place,'Significance': significance})
industrial_centers.head()

Unnamed: 0,Place,Significance
0,Raipur,"Steel, Iron ore, plywood, containers logistics..."
1,Rudrapur,"Automobile, fmcg, chemical, pharmacuetical,Pre..."
2,Bhilai,"Iron and Steel, Power generation, cement, chem..."
3,Raigarh,"Coal, Iron and Steel, Power generation, cement..."
4,Navi Mumbai,"IT, Textiles, Logistics, MSME"


Since quiet of few of the industries shown here won't have much to do with automobiles, I'll keep only those industrial centers in my data-set that contain the word **'auto', 'electronic'** and **'electric'**.

In [3]:
industrial_centers['Significance']=industrial_centers['Significance'].str.upper()
i1=industrial_centers[industrial_centers.Significance.str.contains('AUTO')]
i2=industrial_centers[industrial_centers.Significance.str.contains('ELECTRONIC')]
i3=industrial_centers[industrial_centers.Significance.str.contains('ELECTRIC')]

i1.append(i2)
i1.append(i3)

industrial_centers=i1.drop_duplicates(subset='Place')
industrial_centers

Unnamed: 0,Place,Significance
1,Rudrapur,"AUTOMOBILE, FMCG, CHEMICAL, PHARMACUETICAL,PRE..."
5,Vijayawada,AUTO PARTS
13,Jamshedpur,"IRON AND STEEL, AUTO PARTS"
19,Kharagpur,"CHEMICALS, MACHINERY, HEAVY METALS, AUTOMOBILE..."
22,Pithampur,"AUTO CLUSTER, MEDICINE, COTTON YARN, AUTO TEST..."
23,Belagavi,"HYDRAULICS, HEAVY TOOLS, AUTOMOTIVE EXPORTS, A..."
26,Noida,"SOFTWARE, ELECTRONIC COMPONENTS,MOBILE PHONES,..."
30,Kanpur,"LEATHER, CHEMICAL, FERTILIZERS, IRON AND STEEL..."
31,Rajkot,"AUTO-COMPONENTS, CASTING AND FORGINGS, JEWELRY..."
39,Ahmedabad,"AUTOMOBILE, ENGINEERING, PHARMACEUTICAL, CHEMI..."


### 2. Getting the data of automobile manufacturing facilities in India

In [4]:
base_url = 'https://en.wikipedia.org/wiki/List_of_vehicle_plants_in_India'
index = requests.get(base_url).text
soup=BeautifulSoup(index)

table = soup.find('table',{'class':'wikitable sortable'}).get_text()
table=table.split('\n')

df=pd.DataFrame(table)
df.rename(columns={0: 'text'},inplace=True)
df=df[df.text!='']
df=df.reset_index()
df.drop('index',axis=1,inplace=True)

state=[]
location = []
manufacturer = []
class1 = []

for i in range(5,len(df)):
    if i%4==1:
        state.append(df.iloc[i][0])
    if i%4==2:
        location.append(df.iloc[i][0])
    if i%4==3:
        manufacturer.append(df.iloc[i][0])
    if i%4==0:
        class1.append(df.iloc[i][0])
plants = pd.DataFrame({'Location': location,'Manufacturer':manufacturer,'Class': class1})
plants.head(10)

Unnamed: 0,Location,Manufacturer,Class
0,Sri City,Isuzu Motors India,Passenger & Commercial vehicles
1,Satyavedu,Hero MotoCorp,Two wheelers
2,Vijayawada[1],Ashok Leyland Limited,Commercial vehicles
3,Sri City,kobelco[2],"Cranes, Excavators & back hoe loaders"
4,Penukonda,Kia Motors,"Sportage , Picanto, Rio"
5,Vijayawada[3],AVERA New & Renewable Energy,Electric Two Wheelers
6,"Kodakachani, Medak District[4]",Deccan Auto,Bus Manufacturing Plant
7,Zahirabad,Mahindra & Mahindra,Commercial vehicles
8,Dharuhera,Hero MotoCorp,Two wheelers
9,Gurgaon,Harley-Davidson India,Two wheelers


There are some square brackets (from wikipedia hyperlinks) in this data. Also, there are some commas followed by region names. This data needs to be cleaned up before further processing.

In [5]:
location=[]
manufacturer=[]

for p in plants['Location']:
    p=p.replace('[','')
    p=p.replace(']','')
    p=''.join([i for i in p if not i.isdigit()])
    p=p.split(',', 1)[0]
    location.append(p)
    
for p in plants['Manufacturer']:
    p=p.replace('[','')
    p=p.replace(']','')
    p=''.join([i for i in p if not i.isdigit()])
    manufacturer.append(p)

plants = pd.DataFrame({'Location': location,'Manufacturer':manufacturer,'Class': class1})    
plants.head(10)


Unnamed: 0,Location,Manufacturer,Class
0,Sri City,Isuzu Motors India,Passenger & Commercial vehicles
1,Satyavedu,Hero MotoCorp,Two wheelers
2,Vijayawada,Ashok Leyland Limited,Commercial vehicles
3,Sri City,kobelco,"Cranes, Excavators & back hoe loaders"
4,Penukonda,Kia Motors,"Sportage , Picanto, Rio"
5,Vijayawada,AVERA New & Renewable Energy,Electric Two Wheelers
6,Kodakachani,Deccan Auto,Bus Manufacturing Plant
7,Zahirabad,Mahindra & Mahindra,Commercial vehicles
8,Dharuhera,Hero MotoCorp,Two wheelers
9,Gurgaon,Harley-Davidson India,Two wheelers


Now the data looks clean. To visualize this, we'll first need to get the coordinates of these places.

## Geocoding data - I am using Google Maps API

Since Foursquare does not work well with locations in rural India, I am using google-maps API.

In [6]:
import googlemaps
gmaps = googlemaps.Client(key='XXXX')

Getting coordinates and region polygons for relevant **Industrial Centers** in India.

In [7]:
industrial_centers_coordinates=[]

for i in industrial_centers['Place']:
    industrial_centers_coordinates.append(gmaps.geocode(i))

print(industrial_centers_coordinates)


[[{'address_components': [{'long_name': 'Rudrapur', 'short_name': 'Rudrapur', 'types': ['locality', 'political']}, {'long_name': 'Udham Singh Nagar', 'short_name': 'Udham Singh Nagar', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Uttarakhand', 'short_name': 'UK', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']}], 'formatted_address': 'Rudrapur, Uttarakhand, India', 'geometry': {'bounds': {'northeast': {'lat': 29.021876, 'lng': 79.43029390000001}, 'southwest': {'lat': 28.9546064, 'lng': 79.3608141}}, 'location': {'lat': 28.9875082, 'lng': 79.4141214}, 'location_type': 'APPROXIMATE', 'viewport': {'northeast': {'lat': 29.021876, 'lng': 79.43029390000001}, 'southwest': {'lat': 28.9546064, 'lng': 79.3608141}}}, 'place_id': 'ChIJRffBZ5V_oDkRrzt4OIdc_t8', 'types': ['locality', 'political']}], [{'address_components': [{'long_name': 'Vijayawada', 'short_name': 'Vijayawada', 'types': ['

Now it would make sense to put these coordinates along with the respective dataframe.

In [8]:
lat=[]
lng=[]
for result in industrial_centers_coordinates:
    lat.append(result[0]['geometry']['location']['lat'])
    lng.append(result[0]['geometry']['location']['lng'])

industrial_centers['Latitude']=lat
industrial_centers['Longitude']=lng
industrial_centers

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Place,Significance,Latitude,Longitude
1,Rudrapur,"AUTOMOBILE, FMCG, CHEMICAL, PHARMACUETICAL,PRE...",28.987508,79.414121
5,Vijayawada,AUTO PARTS,16.506174,80.648015
13,Jamshedpur,"IRON AND STEEL, AUTO PARTS",22.804566,86.202875
19,Kharagpur,"CHEMICALS, MACHINERY, HEAVY METALS, AUTOMOBILE...",22.34601,87.231975
22,Pithampur,"AUTO CLUSTER, MEDICINE, COTTON YARN, AUTO TEST...",22.611121,75.677269
23,Belagavi,"HYDRAULICS, HEAVY TOOLS, AUTOMOTIVE EXPORTS, A...",15.849695,74.497674
26,Noida,"SOFTWARE, ELECTRONIC COMPONENTS,MOBILE PHONES,...",28.535516,77.391026
30,Kanpur,"LEATHER, CHEMICAL, FERTILIZERS, IRON AND STEEL...",26.449923,80.331874
31,Rajkot,"AUTO-COMPONENTS, CASTING AND FORGINGS, JEWELRY...",22.303894,70.80216
39,Ahmedabad,"AUTOMOBILE, ENGINEERING, PHARMACEUTICAL, CHEMI...",23.022505,72.571362


Now getting coordinates manufacturing plants in India.

In [9]:
plants['GeocodeAddress']=plants['Manufacturer']+', '+plants['Location']
plants.head()

Unnamed: 0,Location,Manufacturer,Class,GeocodeAddress
0,Sri City,Isuzu Motors India,Passenger & Commercial vehicles,"Isuzu Motors India, Sri City"
1,Satyavedu,Hero MotoCorp,Two wheelers,"Hero MotoCorp, Satyavedu"
2,Vijayawada,Ashok Leyland Limited,Commercial vehicles,"Ashok Leyland Limited, Vijayawada"
3,Sri City,kobelco,"Cranes, Excavators & back hoe loaders","kobelco, Sri City"
4,Penukonda,Kia Motors,"Sportage , Picanto, Rio","Kia Motors, Penukonda"


In [10]:
plants_coordinates=[]

for i in plants['GeocodeAddress']:
    plants_coordinates.append(gmaps.geocode(i))

print(plants_coordinates)

[[{'address_components': [{'long_name': '3500', 'short_name': '3500', 'types': ['street_number']}, {'long_name': 'Central Expressway', 'short_name': 'Central Expy', 'types': ['route']}, {'long_name': 'Sri City', 'short_name': 'Sri City', 'types': ['locality', 'political']}, {'long_name': 'Chittoor', 'short_name': 'Chittoor', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Andhra Pradesh', 'short_name': 'AP', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'India', 'short_name': 'IN', 'types': ['country', 'political']}, {'long_name': '517646', 'short_name': '517646', 'types': ['postal_code']}], 'formatted_address': '3500, Central Expy, Sri City, Andhra Pradesh 517646, India', 'geometry': {'location': {'lat': 13.5558377, 'lng': 80.0188555}, 'location_type': 'ROOFTOP', 'viewport': {'northeast': {'lat': 13.5571866802915, 'lng': 80.02020448029151}, 'southwest': {'lat': 13.5544887197085, 'lng': 80.01750651970849}}}, 'place_id': 'ChIJWZQF4zZ3TToRXS

Again, merging this data with respective dataframe.

In [11]:
lat=[]
lng=[]

i=0
for result in plants_coordinates:
    if result==[]:
        lat.append('NaN')
        lng.append('NaN')
        continue
    lat.append(result[0]['geometry']['location']['lat'])
    lng.append(result[0]['geometry']['location']['lng'])

In [12]:
plants['Latitude']=lat
plants['Longitude']=lng

plants.head()

Unnamed: 0,Location,Manufacturer,Class,GeocodeAddress,Latitude,Longitude
0,Sri City,Isuzu Motors India,Passenger & Commercial vehicles,"Isuzu Motors India, Sri City",13.5558,80.0189
1,Satyavedu,Hero MotoCorp,Two wheelers,"Hero MotoCorp, Satyavedu",13.5244,79.9697
2,Vijayawada,Ashok Leyland Limited,Commercial vehicles,"Ashok Leyland Limited, Vijayawada",16.4992,80.6485
3,Sri City,kobelco,"Cranes, Excavators & back hoe loaders","kobelco, Sri City",13.5534,80.0127
4,Penukonda,Kia Motors,"Sportage , Picanto, Rio","Kia Motors, Penukonda",14.1637,77.6187


In [13]:
plants[plants['Latitude']=='NaN']

Unnamed: 0,Location,Manufacturer,Class,GeocodeAddress,Latitude,Longitude
19,Amb,International Cars & Motors Limited,Passenger vehicles,"International Cars & Motors Limited, Amb",,
31,Hoshiarpur,International Tractors Limited (Sonalika Group),Farm Tractors & Agri Equipments,International Tractors Limited (Sonalika Group...,,
118,Chakan,Hyundai Construction Equipments,Excavators,"Hyundai Construction Equipments, Chakan",,
126,Wai,"hyosung, Benelli",Two wheelers,"hyosung, Benelli, Wai",,
132,Banda,Caterpillar,Commercial,"Caterpillar, Banda",,


We'll take out the above mentioned rows from the dataframe because these locations could not be geocoded by Google Maps.
Also we can take out the 'GeocodeAddress', as our task of Geocoding is complete.

In [14]:
plants=plants[plants['Latitude']!='NaN']
plants.drop(columns=['GeocodeAddress'],axis=1, inplace=True)

plants.head()

Unnamed: 0,Location,Manufacturer,Class,Latitude,Longitude
0,Sri City,Isuzu Motors India,Passenger & Commercial vehicles,13.5558,80.0189
1,Satyavedu,Hero MotoCorp,Two wheelers,13.5244,79.9697
2,Vijayawada,Ashok Leyland Limited,Commercial vehicles,16.4992,80.6485
3,Sri City,kobelco,"Cranes, Excavators & back hoe loaders",13.5534,80.0127
4,Penukonda,Kia Motors,"Sportage , Picanto, Rio",14.1637,77.6187


In [15]:
plants_recovery_data=plants

I am now cleaning the manufacturing plants data and segregating it into four distinct categories viz. two wheeler manufacturers, passenger vehicle manufacturers, commercial vehicles manufacturers and miscllaneous vehicle manufacturers.

In [16]:
plants['Location']=plants['Location'].str.upper()
plants['Manufacturer']=plants['Manufacturer'].str.upper()
plants['Class']=plants['Class'].str.upper()


two_wheeler_plants=plants[plants['Class'].str.find('TWO WHEELERS')==0]
two_wheeler_plants['Class']='TWO WHEELERS'
plants1=plants[~plants['Class'].str.find('TWO WHEELERS')==0]

passenger_vehicle_plants=plants[plants['Class'].str.find('PASSENGER')==0]
passenger_vehicle_plants['Class']='PASSENGER VEHICLES'
plants1=plants1[~plants1['Class'].str.find('PASSENGER')==0]

commercial_vehicle_plants=plants[plants['Class'].str.find('COMMERCIAL')==0]
commercial_vehicle_plants['Class']='COMMERCIAL VEHICLES'
plants1=plants1[~plants1['Class'].str.find('COMMERCIAL')==0]

other_plants=plants1
other_plants['Class']='MISC'

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  from ipykernel import kernelapp as app


## Visualizing the industrial areas and manufacturing plants
**Note 1: I have downloaded polygons from OpenStreetMap website  to display the industrial areas in the following map. Take your time to explore. This data will not be used for K-means clustering.** <br> <br>
Note 2: I am aware that one pointer is appearing over London - that's how Google reverted. I haven't removed it in the visualization phase, but will remove it for the analysis. Please ignore that pointer and explore the rest.


In [177]:
latitudes_industry=industrial_centers['Latitude'].to_list()
longitudes_industry=industrial_centers['Longitude'].to_list()

latitudes_plants=plants['Latitude'].to_list()
longitudes_plants=plants['Longitude'].to_list()


india_map = folium.Map(location=[20.5937,78.9629], zoom_start=5)
india_map

import os
import json

path='GeoJSON_Polygons/'

for file in os.listdir(path):
    with open(path+file) as json_file:
        polygon_data=json.load(json_file)
        
    folium.GeoJson(polygon_data, name=file.replace('.txt','')+' Industrial Area').add_to(india_map)

two_wheeler_plants_map = folium.map.FeatureGroup(name='Two Wheeler Manufacturing Plants')
passenger_vehicle_plants_map = folium.map.FeatureGroup(name='Passenger Vehicle Manufacturing Plants')
commercial_vehicle_plants_map = folium.map.FeatureGroup(name='Commercial Vehicle Manufacturing Plants')
other_plants_map = folium.map.FeatureGroup('Miscellaneous Automotive Manufacturing Plants')


    
for lat, lng, in zip(two_wheeler_plants.Latitude, two_wheeler_plants.Longitude):
    two_wheeler_plants_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=1))

for lat, lng, in zip(passenger_vehicle_plants.Latitude, passenger_vehicle_plants.Longitude):
    passenger_vehicle_plants_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1))    

for lat, lng, in zip(commercial_vehicle_plants.Latitude, commercial_vehicle_plants.Longitude):
    commercial_vehicle_plants_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=3,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1))

    
for lat, lng, in zip(other_plants.Latitude, other_plants.Longitude):
    other_plants_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=3,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=1))
    

india_map.add_child(two_wheeler_plants_map)
india_map.add_child(passenger_vehicle_plants_map)
india_map.add_child(commercial_vehicle_plants_map)
india_map.add_child(other_plants_map)
india_map.add_child(folium.map.LayerControl())


Further cleaning the data - removing the point appearing in London.

In [23]:
passenger_vehicle_plants = passenger_vehicle_plants[passenger_vehicle_plants.Location!='DOLATPORDA']

Merging the segrgated dataframes into one dataframe with complete data.

In [173]:
plants=pd.concat([two_wheeler_plants,passenger_vehicle_plants,commercial_vehicle_plants,other_plants])
plants.head(100)

Unnamed: 0,Location,Manufacturer,Class,Latitude,Longitude
1,SATYAVEDU,HERO MOTOCORP,TWO WHEELERS,13.5244,79.9697
8,DHARUHERA,HERO MOTOCORP,TWO WHEELERS,28.2058,76.7956
9,GURGAON,HARLEY-DAVIDSON INDIA,TWO WHEELERS,28.4323,77.0127
10,GURGAON,HERO MOTOCORP,TWO WHEELERS,28.4656,77.0396
11,FARIDABAD,INDIA YAMAHA MOTOR,TWO WHEELERS,28.3712,77.3132
...,...,...,...,...,...
115,CHAKAN,FORCE MOTORS,COMMERCIAL VEHICLES,18.7422,73.8472
128,NAVI MUMBAI,LIEBHERR,COMMERCIAL VEHICLES,19.0527,73.0079
131,CHAKAN,BEIQI FOTON,COMMERCIAL VEHICLES,18.7632,73.8613
133,THANE,EICHER MOTORS,COMMERCIAL VEHICLES,19.2308,72.9741


If a company can cater to the needs of all the manufacturing plants in India, then the centroid of all the manufacturing plants will be the most ideal location to place such a company. Therefore, before we begin clustring, lets calculate this centroid as well.

In [174]:
country_centroid_latitude = plants.Latitude.mean()
country_centroid_longitude = plants.Longitude.mean()

# K-means Clustering

We'll cluster the geocoded data using K-means clustering for different values of K.
I will manually define the initial centroids for each K.

For all different values of K we use here, we'll get potential markets for tier 1 companies of different sizes. For example, a big tier 1 company like ZF or Bosch can cover a bigger cluster, while a smaller tier 1 company like Magna can cover a smaller cluster.

In [100]:
df=plants
df.drop(['Location','Manufacturer','Class'],axis=1,inplace=True)

**Defining the largest clusters - I am calling these as 'Super Clusters'.
These are the largest clusters a company can target.**

In [102]:
from sklearn.preprocessing import StandardScaler

X = df.values[:,:]
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset

num_clusters = 3

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(cluster_dataset)
labels = k_means.labels_

df['Supercluster_Labels'] = labels
df.head(5)

Unnamed: 0,Latitude,Longitude,Supercluster_Labels
1,13.5244,79.9697,1
8,28.2058,76.7956,0
9,28.4323,77.0127,0
10,28.4656,77.0396,0
11,28.3712,77.3132,0


**Now definining regional clusters. I am calling these sub-clusters. These areas are potential market for a relatively smaller company.**

In [103]:
X = pd.DataFrame(columns=['Latitude','Longitude'])
X['Latitude']=df['Latitude']
X['Longitude']=df['Longitude']

X = X.values[:,:]
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset

num_clusters = 5

k_means = KMeans(init="k-means++", n_clusters=num_clusters, n_init=12)
k_means.fit(cluster_dataset)
labels = k_means.labels_

df['Subcluster_Labels'] = labels
df.head(5)

Unnamed: 0,Latitude,Longitude,Supercluster_Labels,Subcluster_Labels
1,13.5244,79.9697,1,4
8,28.2058,76.7956,0,0
9,28.4323,77.0127,0,0
10,28.4656,77.0396,0,0
11,28.3712,77.3132,0,0


Supercluster with the most number of manufacturing plants will have the greatest number of business opportunities.
Also, the center of this cluster is most sutable for a large Tier 1 supplier company to place itself.

In [131]:
temp = df.groupby('Supercluster_Labels').count()
temp

Unnamed: 0_level_0,Latitude,Longitude,Subcluster_Labels
Supercluster_Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,34,34,34
1,53,53,53
2,42,42,42


In [132]:
max_opportunities_supercluster=temp['Latitude'].idxmax()
print('Supercluster '+ str(max_opportunities_supercluster)+ ' has the maximum number of business opportunities.')

Supercluster 1 has the maximum number of business opportunities.


In [133]:
temp = df.groupby('Subcluster_Labels').count()
temp

Unnamed: 0_level_0,Latitude,Longitude,Supercluster_Labels
Subcluster_Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,34,34,34
1,19,19,19
2,42,42,42
3,5,5,5
4,29,29,29


In [134]:
max_opportunities_subcluster=temp['Latitude'].idxmax()
print('Sub-cluster '+ str(max_opportunities_subcluster)+ ' has the maximum number of business opportunities.')

Sub-cluster 2 has the maximum number of business opportunities.


In [169]:
best_supercluster_longitude=df[df.Supercluster_Labels==max_opportunities_supercluster].Longitude.mean()
best_supercluster_latitude=df[df.Supercluster_Labels==max_opportunities_supercluster].Latitude.mean()

print(best_supercluster_longitude)
print(best_supercluster_latitude)

79.8883254207547
14.325124584905662


In [137]:
best_subcluster_longitude=df[df.Subcluster_Labels==max_opportunities_subcluster].Longitude.mean()
best_subcluster_latitude=df[df.Subcluster_Labels==max_opportunities_subcluster].Latitude.mean()

In [176]:
india_map = folium.Map(location=[20.5937,78.9629], zoom_start=5)
india_map

l0=df[df.Supercluster_Labels==0]
l1=df[df.Supercluster_Labels==1]
l2=df[df.Supercluster_Labels==2]

l0_map = folium.map.FeatureGroup(name='Super Cluster 0')
l1_map = folium.map.FeatureGroup(name='Super Cluster 1')
l2_map = folium.map.FeatureGroup(name='Super Cluster 2')

for lat, lng, in zip(l0.Latitude, l0.Longitude):
    l0_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=10,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=1))

for lat, lng, in zip(l1.Latitude, l1.Longitude):
    l1_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=10,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1))    

for lat, lng, in zip(l2.Latitude, l2.Longitude):
    l2_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=10,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1))

l0=df[df.Subcluster_Labels==0]
l1=df[df.Subcluster_Labels==1]
l2=df[df.Subcluster_Labels==2]
l3=df[df.Subcluster_Labels==3]
l4=df[df.Subcluster_Labels==4]


l0_sub_map = folium.map.FeatureGroup(name='Sub Cluster 0')
l1_sub_map = folium.map.FeatureGroup(name='Sub Cluster 1')
l2_sub_map = folium.map.FeatureGroup(name='Sub Cluster 2')
l3_sub_map = folium.map.FeatureGroup(name='Sub Cluster 3')
l4_sub_map = folium.map.FeatureGroup(name='Sub Cluster 4')

for lat, lng, in zip(l0.Latitude, l0.Longitude):
    l0_sub_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1))

for lat, lng, in zip(l1.Latitude, l1.Longitude):
    l1_sub_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=1))    

for lat, lng, in zip(l2.Latitude, l2.Longitude):
    l2_sub_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=1))

for lat, lng, in zip(l3.Latitude, l3.Longitude):
    l3_sub_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='magenta',
        fill=True,
        fill_color='magenta',
        fill_opacity=1))
    
for lat, lng, in zip(l4.Latitude, l4.Longitude):
    l4_sub_map.add_child(
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='cyan',
        fill=True,
        fill_color='cyan',
        fill_opacity=1))
    
folium.Marker(location=[country_centroid_latitude, country_centroid_longitude],
              popup='Ideal location to place a company that can cater to all manufacturing plants in India').add_to(india_map)  
    
folium.Marker(location=[best_supercluster_latitude, best_supercluster_longitude],
              popup='Ideal location to place a company that can cater to all manufacturing plants in the largest super-cluster').add_to(india_map)    

folium.Marker(location=[best_subcluster_latitude, best_subcluster_longitude],
              popup='Ideal location to place a company that can cater to all manufacturing plants in the largest sub-cluster').add_to(india_map)

india_map.add_child(l0_map)
india_map.add_child(l1_map)
india_map.add_child(l2_map)   
india_map.add_child(l0_sub_map)
india_map.add_child(l1_sub_map)
india_map.add_child(l2_sub_map)
india_map.add_child(l3_sub_map)
india_map.add_child(l4_sub_map)
india_map.add_child(folium.map.LayerControl())

india_map



Furthermore, since our we have classified the data into 4 types of manufacturing plants, we can further run K-means clustering to find out ideal locations to place companies that cater to specific types of manufacturing plants.