# City Search Tool

There are a lot of factors that go into making a big move, and for many people, the top priority is either their job or their family. But if you’re on your own and you have job flexibility to go basically wherever you want (i.e. you work remotely), then what? In that case, you have the luxury of finding a place that suits you—and not necessarily just your career.

A myriad of decisions go into picking the perfect place to call home — political leanings, crime rates, walkability, affordability, religious affiliations, weather and more — can you make a tool that allows aggie graduates and others to find thier next move?

[High speed internet](https://www.highspeedinternet.com/best-cities-to-live-work-remotely) (of all people?!) made a tool to do this.... but you can do better! Think of more factors: like median income of a location, cuisine, primary ethnicity, pollution index, happiness index, number of coffee shops or microbreweries in the city, etc. There's no end! Furthermore, maybe you are an international student and want to make this tool for global placement! Go for it! Maybe you want to penalize distance from POI's (points of interest) like family. Do it! The world is your oyster!

#### Starter Datasets
- [MoveHub City Ratings](https://www.kaggle.com/blitzr/movehub-city-rankings?select=movehubqualityoflife.csv)
  - [Notebooks for ideas on how to use data](https://www.kaggle.com/blitzr/movehub-city-rankings/notebooks)
- [World City Populations](https://www.kaggle.com/max-mind/world-cities-database?select=worldcitiespop.csv)
- [Rental Price](https://www.kaggle.com/zillow/rent-index)

#### Where to Find More Data
- [Google Datasets](https://datasetsearch.research.google.com/)
- [US Census](https://data.census.gov/cedsci/?q=United%20States)
- [Kaggle Datasets](https://www.kaggle.com/datasets)


#### How We Judge
- *Data Use*: Effectively used data, acquired additional data
- *Analytics*: Effective application of analytics (bonus points for ML/clustering techniques)
- *Visualization*: Solution is visually appealing and useful (Bonus points if you create an interactive tool/ application/ website)
- *Impact*: Clear impact of solution to solving problem

#### Helpful Workshops
- Intro to Python: Sat, 10:30-12:00
- Statistics for Data Scientists: Sat, 10:30-12:00
- How to Win TAMU Datathon: Sat, 13:00-14:00
- Data Wrangling: Sat, 17:00-18:15
- Data Visualization: Sat, 18:30-19:45
- Machine Learning Part 1 - Theory: Sat, 20:00-21:15
- Machine Learning Part 2 - Applied: Sat, 21:30-22:45


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import geograpy
import country_converter as coco
from scipy.spatial.distance import cdist

In [8]:
df = pd.read_csv('https://tamu-datathon-2020.s3.us-east-2.amazonaws.com/data/country.csv', index_col = 0).drop("Unnamed: 0.1", axis = 1)

In [None]:
# iso2_codes = coco.convert(names=df_covid["country_name"].tolist(), to='ISO2')
# df_covid["country_code"] = iso2_codes

# df_happiness["Country (region)"] = coco.convert(names=df_happiness["Country (region)"].tolist(), to='ISO2')
# df_covid_filter = df_covid[df_covid["country_code"].isin(df["Country_Code"])]

# df_democracy = pd.read_csv("democracy.csv")[["Country_Code", "Score"]]

# # Merge democracy index
# df_merged = pd.merge(df, df_democracy, how='inner', on="Country_Code")
# df_merged = df_merged.rename(columns={"Score": "Democracy"})

# # Merge happiness rank
# df_merged_2 = pd.merge(df_merged, df_happiness[["Country_Code", "Ladder"]], how='inner', on="Country_Code")
# df_merged_2 = df_merged_2.rename(columns={"Ladder": "Happiness"})

# # Standardize happiness
# df_merged_2["Happiness"] = ((1 - (df_merged_2["Happiness"]/df_merged_2["Happiness"].max())) * 100).round(2)

# df_merged_2.to_csv("country.csv")

In [None]:
df_covid = df_covid[df_covid['Latitude'].notna()]
df_covid["City"] = df_covid["country_name"] + df_covid["subregion1_name"] + df_covid["subregion1_name"]
covid_locations = df_covid[["Latitude", "Longitude", "City"]].rename(columns={
    "Latitude": "lat", 
    "Longitude": "lng"
})
general_locations = df[["lat", "lng", "City", "Country"]].rename(columns={
    "City": "cty"
})

def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]

df1 = covid_locations
df2 = general_locations

df1['point'] = [(x, y) for x,y in zip(df1['lat'], df1['lng'])]
df2['point'] = [(x, y) for x,y in zip(df2['lat'], df2['lng'])]

df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['City'] = [match_value(df1, 'point', x, 'City') for x in df2['closest']]

In [9]:
df

Unnamed: 0,City,Movehub Rating,Purchase Power,Health Care,Pollution,Quality of Life,Crime Rating,Country,Country_Code,lat,lng,Democracy,Happiness,Population,Price Per Square Foot (USD)
0,Caracas,65.18,11.25,44.44,83.45,8.61,85.70,Venezuela,VE,10.506098,-66.914602,2.88,27.03,2938992.0,98.12
1,Johannesburg,84.08,53.99,59.98,47.39,51.26,83.93,South Africa,ZA,-26.205000,28.049722,7.24,28.38,5782747.0,926.65
2,Cape Town,87.95,60.36,71.67,75.98,78.73,68.06,South Africa,ZA,-33.928992,18.417396,7.24,28.38,4617560.0,2181.94
3,Pretoria,80.56,46.74,71.11,70.13,61.44,68.06,South Africa,ZA,-25.745937,28.187944,7.24,28.38,2565660.0,616.44
4,Fortaleza,80.17,52.28,45.46,66.32,36.68,78.65,Brazil,BR,-3.730451,-38.521799,6.86,78.38,4073465.0,1480.41
5,Porto Alegre,70.46,19.07,51.01,86.16,31.87,76.46,Brazil,BR,-30.032500,-51.230377,6.86,78.38,4137417.0,1254.44
6,Rio De Janeiro,73.44,22.13,61.67,84.51,21.32,67.93,Brazil,BR,-22.911014,-43.209373,6.86,78.38,6720000.0,164.06
7,Sao Paulo,75.40,24.24,63.79,72.04,30.57,66.31,Brazil,BR,-23.550651,-46.633382,6.86,78.38,22043028.0,2068.98
8,Belo Horizonte,71.79,21.77,57.26,59.73,36.26,50.99,Brazil,BR,-19.922732,-43.945095,6.86,78.38,6084430.0,1484.65
9,Curitiba,76.15,34.97,59.03,33.59,58.67,46.27,Brazil,BR,-25.429596,-49.271272,6.86,78.38,3678732.0,1061.84


In [21]:
from PIL import Image
import requests
from io import BytesIO

image_links = pd.read_csv('df_country.csv')[["City", "Image"]]
image_links['paired'] = list(zip(image_links.City, image_links.Image))

def resizer(x):
    response = requests.get(x[1])
    img = Image.open(BytesIO(response.content))
    new_img = img.resize((800,800))
    new_img.save((x[0] + ".jpg"), "JPEG", optimize=True)
    print((x[0] + ".jpg"), "saved")
image_links["paired"].apply(resizer)

Caracas.jpg saved
Johannesburg.jpg saved
Cape Town.jpg saved
Pretoria.jpg saved
Fortaleza.jpg saved
Porto Alegre.jpg saved
Rio De Janeiro.jpg saved
Sao Paulo.jpg saved
Belo Horizonte.jpg saved
Curitiba.jpg saved
Saint Louis.jpg saved
Detroit.jpg saved
Las Vegas.jpg saved
Philadelphia.jpg saved
Los Angeles.jpg saved
Miami.jpg saved
Houston.jpg saved
Tampa.jpg saved
Dallas.jpg saved
Baltimore.jpg saved
Atlanta.jpg saved
Orlando.jpg saved
Chicago.jpg saved
Washington.jpg saved
San Antonio.jpg saved
New York.jpg saved
Honolulu.jpg saved
Austin.jpg saved
New Orleans.jpg saved
Seattle.jpg saved
Phoenix.jpg saved
Boston.jpg saved
Minneapolis.jpg saved
San Diego.jpg saved
Portland.jpg saved
Charlotte.jpg saved
Asheville.jpg saved
Rochester.jpg saved
Indianapolis.jpg saved
Newark.jpg saved
Nashville.jpg saved
Mexico City.jpg saved
Noida.jpg saved
Gurgaon.jpg saved
Delhi.jpg saved
Bangalore.jpg saved
Kolkata.jpg saved
Pune.jpg saved
Mumbai.jpg saved
Chennai.jpg saved
Vadodara.jpg saved
Ahmedabad

0      None
1      None
2      None
3      None
4      None
5      None
6      None
7      None
8      None
9      None
10     None
11     None
12     None
13     None
14     None
15     None
16     None
17     None
18     None
19     None
20     None
21     None
22     None
23     None
24     None
25     None
26     None
27     None
28     None
29     None
       ... 
186    None
187    None
188    None
189    None
190    None
191    None
192    None
193    None
194    None
195    None
196    None
197    None
198    None
199    None
200    None
201    None
202    None
203    None
204    None
205    None
206    None
207    None
208    None
209    None
210    None
211    None
212    None
213    None
214    None
215    None
Name: paired, Length: 216, dtype: object

In [23]:
df.to_csv("country.csv")

In [28]:
from bs4 import BeautifulSoup
import requests

In [39]:
def get_weather(city):
    url = "https://en.climate-data.org/search/?q=" + city
    page = requests.get(url)
    soup = BeautifulSoup(page.text)
    divs = soup.findAll("div", {"class": "data"})
    print(divs)
    
get_weather("Hyderabad") 

[]


In [47]:
df

Unnamed: 0,City,Movehub Rating,Purchase Power,Health Care,Pollution,Quality of Life,Crime Rating,Country,Country_Code,lat,lng,Democracy,Happiness,Population,Price Per Square Foot (USD),paired
0,Caracas,65.18,11.25,44.44,83.45,8.61,85.70,Venezuela,VE,10.506098,-66.914602,2.88,27.03,2938992.0,98.12,"(Caracas, http://images.unsplash.com/photo-150..."
1,Johannesburg,84.08,53.99,59.98,47.39,51.26,83.93,South Africa,ZA,-26.205000,28.049722,7.24,28.38,5782747.0,926.65,"(Johannesburg, http://images.unsplash.com/phot..."
2,Cape Town,87.95,60.36,71.67,75.98,78.73,68.06,South Africa,ZA,-33.928992,18.417396,7.24,28.38,4617560.0,2181.94,"(Cape Town, http://images.unsplash.com/photo-1..."
3,Pretoria,80.56,46.74,71.11,70.13,61.44,68.06,South Africa,ZA,-25.745937,28.187944,7.24,28.38,2565660.0,616.44,"(Pretoria, http://images.unsplash.com/photo-15..."
4,Fortaleza,80.17,52.28,45.46,66.32,36.68,78.65,Brazil,BR,-3.730451,-38.521799,6.86,78.38,4073465.0,1480.41,"(Fortaleza, http://images.unsplash.com/photo-1..."
5,Porto Alegre,70.46,19.07,51.01,86.16,31.87,76.46,Brazil,BR,-30.032500,-51.230377,6.86,78.38,4137417.0,1254.44,"(Porto Alegre, http://images.unsplash.com/phot..."
6,Rio De Janeiro,73.44,22.13,61.67,84.51,21.32,67.93,Brazil,BR,-22.911014,-43.209373,6.86,78.38,6720000.0,164.06,"(Rio De Janeiro, http://images.unsplash.com/ph..."
7,Sao Paulo,75.40,24.24,63.79,72.04,30.57,66.31,Brazil,BR,-23.550651,-46.633382,6.86,78.38,22043028.0,2068.98,"(Sao Paulo, http://images.unsplash.com/photo-1..."
8,Belo Horizonte,71.79,21.77,57.26,59.73,36.26,50.99,Brazil,BR,-19.922732,-43.945095,6.86,78.38,6084430.0,1484.65,"(Belo Horizonte, http://images.unsplash.com/ph..."
9,Curitiba,76.15,34.97,59.03,33.59,58.67,46.27,Brazil,BR,-25.429596,-49.271272,6.86,78.38,3678732.0,1061.84,"(Curitiba, http://images.unsplash.com/photo-15..."


In [None]:
#@title Rate importance of each of the following factors

movehub_rating = "High" #@param ["None", "Low", "Med", "High"]
purchase_power = "High" #@param ["None", "Low", "Med", "High"]
health_care = "Low" #@param ["None", "Low", "Med", "High"]
quality_of_life = "Low" #@param ["None", "Low", "Med", "High"]
pollution = "None" #@param ["None", "Low", "Med", "High"]
crime_rating = "None" #@param ["None", "Low", "Med", "High"]

weights = [
  movehub_rating,
  purchase_power,
  health_care,
  quality_of_life,
  pollution,
  crime_rating,
]

replace = {'None': 0, 'Low': 1, 'Med': 2, 'High': 3}
weights = np.array([replace[x] for x in weights])
weights *= [1, 1, 1, 1, -1, -1]

features = ['Movehub Rating', 'Purchase Power', 'Health Care', 'Quality of Life', 'Pollution', 'Crime Rating']
norm = lambda xs: (xs-xs.min())/(xs.max()-xs.min())

df['Score'] = norm(df[features].dot(weights))*10

fig = px.scatter_mapbox(df.sort_values('Score', ascending=False).round(),
                        lat="lat", lon="lng", color="Score", hover_name="City",
                        hover_data=features,
                        color_continuous_scale=px.colors.cyclical.IceFire, size_max=15, zoom=1,
                        mapbox_style="carto-positron")
fig.show()

df.sort_values('Score', ascending=False)[['City', 'Score'] + features].round()

[Maps with express](https://plotly.com/python/plotly-express/#maps)

# Host On Web App?
Show us all you got by building a dashboard webapp in Python at
[streamlit.io](https://www.streamlit.io/)!