# Employee and Bus Stops

## Context

A company XYZ intends to provide a bus shuttle service that would help its employees commute to the office. The company is based in Mountain View and the shuttle would provide transportation for employees based in San Francisco.

The city of San Francisco has given the company a list of potential bus stops that it may use. However, the company may use no more than 10 of these bus stops for its shuttle service.

The company XYZ is asking you to come up with the **10 most efficient bus stops** that would best serve its employees. Generally speaking, these "efficient" stops would result in the least walking distances between the employees' homes and their respective bus stops. To that end, you were given the following data:
- the list of bus stops provided by the city of San Francisco, `Bus_Stops.csv`
- a list of its employees' home addresses, `Employee_Addresses.csv`

Since trying out all possible combinations of 10 bus stops would take a prohibitively long time, the boss of XYZ has told you that you may simplify the problem and come up with 10 reasonable bus stops that are probably efficient.

## Objectives

- Explore and analyze the data. Provide comments on the outputs of your code and document your code well. 
- Feel free to show off your map visualization skills.
- Write an algorithm that produces the 10 best stops in your opinion. Also, please explain the rationale behind the algorithm. 
- Please the calculate the average walking distance per employee to their respective stops and report it at the end of your work.
- You may code the solution for this task in either Python or R. If you are coding in Python, you may enter your solution at the bottom of this notebook. Otherwise, you may create a new R jupyter notebook and copy this problem description over there. Either way, your solution is to be in the form of a jupyter notebook, regardless of the programming language used.
- Submit your work along with the data files used (`Bus_Stops.csv` & `Employee_Addresses.csv`) in a single ZIP file named as follows: `<FirstName>_<LastName>.zip`

## Evaluation

***Your solution will be evaluated on:***
- ***The soundness of the algorithm used to select the bus stops.***
- ***How much your code is neat, clear and well-documented.***
- ***Quality of narrative and commentary with interesting analyses and visuals.***

## Supplementary Notes

Prior to writing the requested algorithm, you will need to *geocode* the employees' home addresses and bus stops. You may use the [HERE REST APIs](https://developer.here.com/develop/rest-apis) for that purpose. Following are some links to help you in your task:
- To generate a free HereMaps account and an API Key to use for geocoding the addresses:  
    - https://developer.here.com/documentation/identity-access-management/dev_guide/topics/plat-using-apikeys.html
- Sections pertaining to *geocoding* in the documentation:  
    - https://developer.here.com/documentation/geocoder/dev_guide/topics/example-geocoding-free-form.html  
    - https://developer.here.com/documentation/geocoder/dev_guide/topics/example-geocoding-intersection.html
- Programmatically perform GET requests
    - Python: https://realpython.com/python-requests/
    - R: https://www.rdocumentation.org/packages/httr/versions/1.4.4

Note that HereMaps allows a maximum of 1000 requests per day, so it will take more than a single day to do all the geocoding. As a start, you may use all bus stops (~120 stops) and a few hundred employee addresses to start developing your algorithm. Save whatever you geocode so you would not need to geocode it again. Once the geocoding is done for all addresses, run the algorithm one last time and finalize your work.

*Side Note:*  
*The use of HereMaps API for geocoding is just one suggestion. If you are more comfortable using GoogleMaps API or OpenStreetMaps API, then you may use that as well. There are no constraints as to what you may use for geocoding of the addresses.*

<BR>
<BR>
<center><b><u>Finally, note that we will re-run your code (without the geocoding part) to make sure that your work is reproducible.</u></b></center>

<BR>
<center>
<H2>*** GOOD LUCK ***</H2>
</center>

--------

#Importing Libraries

In [34]:
import numpy as np
import pandas as pd
import requests

In [38]:
bus_stops=pd.read_csv("Bus_Stops.csv")
emp_address=pd.read_csv("Employee_Addresses.csv")

In [39]:
bus_stops.nunique()

Street_One      1
Street_Two    119
dtype: int64

#Data Cleaning and GeoCoding

In [40]:
final_loc=[0,0]
Base_url='https://nominatim.openstreetmap.org/search?format=json'
temp=requests.get(f"{Base_url}&street=MISSION ST&country=US&city=San Francisco").json()
final_loc[0]=float(temp[0].get('lat'))
final_loc[1]=float(temp[0].get('lon'))
final_loc 

[37.7800823, -122.4098437]

In [None]:
data={
    "Street":emp_address.address.map(lambda m:m.split(",")[0]),
    "City":emp_address.address.map(lambda m:m.split(",")[1]),
    "PO":emp_address.address.map(lambda m:m.split(",")[2]),
    "Country":emp_address.address.map(lambda m:m.split(",")[3])
    }
new_df=pd.DataFrame(data)

In [None]:
for index, row in new_df.iterrows():
  if row['Country']!=' USA':
    row['Street']=row['City']
    row['City']=row['PO']
    row['PO']=row['Country']
    row['Country']='USA'
new_df['PO']=new_df.PO.map(lambda m:m.replace(" CA ", ""))
new_df['Country']=new_df.Country.map(lambda m:m.replace(" ", ""))
new_df=new_df.drop(new_df.loc[new_df.PO==' CA'].index)

In [None]:
bs=bus_stops.copy()
bs['lon']=''
bs['lat']=''

In [None]:
def bus_cor():
  for index, row in bs.iterrows():
      row['lat']=requests.get(f"{Base_url}&street={row['Street_Two']}&country=US&city=San Francisco").json()
      try:
        row['lon']=row['lat'][0].get('lon') 
      except:
        row['lon']=''
      try:
        row['lat']=row['lat'][0].get('lat')
      except:
        row['lat']=''
  bs.drop(bs.loc[bs.lat==''].index,inplace=True)

12 out of 119 not showing results for either containing numbers as street names or not found according to the OpenStreetMap API

In [None]:
def emp_cor():
  new_df['lon']=''
  new_df['lat']=''
  for index, row in new_df.iterrows():
      row['lat']=requests.get(f"{Base_url}&street={row['Street']}&country=US&city={row['City']}&postalcode={row['PO']}").json()
      try:
        row['lon']=row['lat'][0].get('lon')
      except:
          row['lon']=requests.get(f"{Base_url}&country=US&city={row['City']}&postalcode={row['PO']}").json()
          try:
            row['lon']=row['lat'][0].get('lon')
          except:
            row['lon']=''
      try:
        row['lat']=row['lat'][0].get('lat')
      except:
          row['lat']=requests.get(f"{Base_url}&country=US&city={row['City']}&postalcode={row['PO']}").json()
          try:
            row['lat']=row['lat'][0].get('lat')
          except:
            row['lat']=''

In [None]:
new_df.to_csv('emp.csv')
bs.to_csv('bus.csv')

https://www.section.io/engineering-education/using-geopy-to-calculate-the-distance-between-two-points/

https://towardsdatascience.com/calculating-distance-between-two-geolocations-in-python-26ad3afe287b


https://www.youtube.com/watch?v=PuJ_JUkahXQ

https://jupyter-gmaps.readthedocs.io/en/latest/app_tutorial.html

https://github.com/pbugnion/gmaps

https://www.youtube.com/watch?v=Bne9VASvDoI

DS Wise i'll use clustering to cluster the employees according the bus locations and see which clusters are the best so i dont think i'll ned 

In [22]:
import pandas as pd

In [41]:
bus=pd.read_csv('bus.csv')
emp=pd.read_csv('emp.csv')
bus.drop(columns='Unnamed: 0',inplace=True)
emp.drop(columns='Unnamed: 0',inplace=True)
bus['final_lon']=final_loc[1]
bus['final_lat']=final_loc[0]

In [42]:
pip install pydeck

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


This view is to be changed to each emp walking distance to nearest bus stop

In [43]:
import pydeck as pdk
DOWNTOWN_BOUNDING_BOX = [
    -122.43135291617365,
    37.766492914983864,
    -122.38706428091974,
    37.80583561830737,
]


def in_bounding_box(point):
    lng, lat = point
    in_lng_bounds = DOWNTOWN_BOUNDING_BOX[0] <= lng <= DOWNTOWN_BOUNDING_BOX[2]
    in_lat_bounds = DOWNTOWN_BOUNDING_BOX[1] <= lat <= DOWNTOWN_BOUNDING_BOX[3]
    return in_lng_bounds and in_lat_bounds


df = bus[bus[["lon", "lat"]].apply(lambda row: in_bounding_box(row), axis=1)]

GREEN_RGB = [0, 255, 0, 40]
RED_RGB = [240, 100, 0, 40]

arc_layer = pdk.Layer(
    "ArcLayer",
    data=df,
    get_width="4",
    get_source_position=["lon", "lat"],
    get_target_position=["final_lon", "final_lat"],
    get_tilt=15,
    get_source_color=RED_RGB,
    get_target_color=GREEN_RGB,
    pickable=True,
    auto_highlight=True,
)

view_state = pdk.ViewState(
    latitude=37.7770041,
    longitude=-122.4144972,
    bearing=120,
    pitch=30,
    zoom=12,
)
r = pdk.Deck(arc_layer,map_style='light',initial_view_state=view_state)
r.to_html("arc_layer.html")

<IPython.core.display.Javascript object>

In [44]:
import pandas as pd

In [45]:
emp.lon[0]

-122.427311

In [46]:
import folium
emp['emp_id']=emp_address['employee_id']
temp_emp=emp[['emp_id','lon','lat']]
temp_emp=temp_emp.dropna()
map = folium.Map(location=[temp_emp.lat.mean(), temp_emp.lon.mean()], zoom_start=14, control_scale=True)
for index, location_info in temp_emp.iterrows():
    folium.Marker([location_info["lat"], location_info["lon"]], popup=location_info["emp_id"]).add_to(map)
map

In [47]:
from geopy.distance import great_circle as GRC

In [147]:
temp_emp['Street_Two_Name']=""
temp_emp['Nearest_bus_lon']=""
temp_emp['Nearest_bus_lat']=""
temp_emp['nearest_bus_stop']=""

In [148]:
min=9999999
Street_Two_Name=""
lon=""
lat=""
for j, emp in temp_emp.iterrows():
  for index, bus_stop in bus.iterrows():
    temp=GRC([temp_emp.lat[j],temp_emp.lon[j]],[bus.lat[index],bus.lon[index]]).km
    if min>temp:
      min=temp
      Street_Two_Name=bus.Street_Two[index]
      lon=bus.lon[index]
      lat=bus.lat[index]
  temp_emp['Nearest_bus_lon'][j]=lon
  temp_emp['Nearest_bus_lat'][j]=lat
  temp_emp['nearest_bus_stop'][j]=min
  temp_emp['Street_Two_Name'][j]=Street_Two_Name
  

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp_emp['Nearest_bus_lon'][j]=lon
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp_emp['Nearest_bus_lat'][j]=lat
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp_emp['nearest_bus_stop'][j]=min
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  temp_emp['Street_Two_Name'][j]=Street_Two_Name


In [156]:
top_10=pd.DataFrame(temp_emp.Street_Two_Name.value_counts().reset_index()[0:10])
top_10

Unnamed: 0,index,Street_Two_Name
0,MORSE ST,1895
1,15TH ST,174
2,SENECA AVE,85
3,12TH ST,9
4,14TH ST,6
5,PERSIA AVE,5
6,OLIVER ST,4
7,13TH ST,3
8,NEY ST,2
9,COLLEGE AVE,1


In [52]:
temp_emp.nearest_bus_stop=temp_emp.nearest_bus_stop.astype('float')

In [53]:
pip install openrouteservice folium 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Calculated the farthest bus stop

In [104]:
bus

Unnamed: 0,Street_One,Street_Two,lon,lat,final_lon,final_lat
0,MISSION ST,ITALY AVE,-122.435552,37.716575,-122.409844,37.780082
1,MISSION ST,NEW MONTGOMERY ST,-122.401524,37.788280,-122.409844,37.780082
2,MISSION ST,20TH ST,-122.388577,37.760500,-122.409844,37.780082
3,MISSION ST,FREMONT ST,-122.396994,37.790707,-122.409844,37.780082
4,MISSION ST,13TH ST,-122.413403,37.769475,-122.409844,37.780082
...,...,...,...,...,...,...
102,MISSION ST,OTTAWA AVE,-122.444936,37.714577,-122.409844,37.780082
103,MISSION ST,NIAGARA AVE,-122.450226,37.720521,-122.409844,37.780082
104,MISSION ST,ACTON ST,-122.452355,37.708226,-122.409844,37.780082
105,MISSION ST,24TH ST,-122.410895,37.752702,-122.409844,37.780082


In [187]:
top_10

Unnamed: 0,index,Street_Two_Name
0,MORSE ST,1895
1,15TH ST,174
2,SENECA AVE,85
3,12TH ST,9
4,14TH ST,6
5,PERSIA AVE,5
6,OLIVER ST,4
7,13TH ST,3
8,NEY ST,2
9,COLLEGE AVE,1


In [199]:
max=-9
temp=""
cord=""
for index, bus_stop in top_10.iterrows():
  temp=GRC(final_loc,[bus[bus['Street_Two']==top_10['index'][index]].lat.item(),bus[bus['Street_Two']==top_10['index'][index]].lon.item()]).km
  if max<temp:
    max=temp
    cord=[bus[bus['Street_Two']==top_10['index'][index]].lat.item(),bus[bus['Street_Two']==top_10['index'][index]].lon.item()]

to better visualize make the line go through all the bus stops captured

In [234]:
import json
import openrouteservice
from openrouteservice import convert
import folium

client = openrouteservice.Client(key='5b3ce3597851110001cf6248f04736633a61445cbe175a9bb186b8e6')
coords = ((final_loc[1],final_loc[0]),(cord[1],cord[0]))
res = client.directions(coordinates=coords,profile="foot-walking",radiuses=10000,preference="recommended")
res = client.directions(coords)
geometry = client.directions(coords)['routes'][0]['geometry']
decoded = convert.decode_polyline(geometry)
distance_txt = "<h4> <b>Distance :&nbsp" + "<strong>"+str(round(res['routes'][0]['summary']['distance']/1000,1))+" Km </strong>" +"</h4></b>"
duration_txt = "<h4> <b>Duration :&nbsp" + "<strong>"+str(round(res['routes'][0]['summary']['duration']/60,1))+" Mins. </strong>" +"</h4></b>"
m = folium.Map(location=final_loc,zoom_start=10, control_scale=True,tiles="cartodbpositron")
folium.GeoJson(decoded).add_to(m)
folium.GeoJson(decoded).add_child(folium.Popup(distance_txt+duration_txt,max_width=300)).add_to(m)

folium.Marker(location=[final_loc[0],final_loc[1]],popup="Work",).add_to(m)
for i in range(0,len(top_10)):
   folium.Marker(
      location=[bus[bus['Street_Two']==top_10['index'][i]].lat.item(),bus[bus['Street_Two']==top_10['index'][i]].lon.item()],
      popup=temp_emp.iloc[i]['Street_Two_Name'],
   ).add_to(m)

m.save('map.html')
m

Assuming non Data science Approach is done work on clustering the locations and finding the top 10 bus stops using a funcational algorithm