# Introduction

We are looking for quantitative criteria to distinguish best cities in the world for life quality.
We will be using the API <a href="https://developers.teleport.org/api/getting_started/" target="_blank">Teleport</a> documentation 

- <a href="https://developers.teleport.org/api/getting_started/" target="_blank">documentation Teleport API</a>

We will also need to use a website called RandomList.com that will give us a random cities around the world to get a scoring. 
All the data will be stored in a public S3 bucket.


## Part 1: Get data for 1 City 

In [2]:
import requests

In [3]:
response = requests.get(" https://api.teleport.org/api/cities/?search=Paris")
response

<Response [200]>

We get lot of results, only the first [0] element is relevant

In [4]:
response.json()

{'_embedded': {'city:search-results': [{'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/'}},
    'matching_alternate_names': [{'name': 'Paris'},
     {'name': 'paris'},
     {'name': 'Parisi'}],
    'matching_full_name': 'Paris, Île-de-France, France'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:4717560/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Paris, Texas, United States'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:3489854/'}},
    'matching_alternate_names': [],
    'matching_full_name': 'Kingston, Kingston, Jamaica'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:966166/'}},
    'matching_alternate_names': [{'name': 'Paris'}],
    'matching_full_name': 'Parys, Orange Free State, South Africa (Paris)'},
   {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geoname

In [5]:
response.json().keys()

dict_keys(['_embedded', '_links', 'count'])

In [6]:
response.json()['_embedded'].keys()

dict_keys(['city:search-results'])

In [7]:
first_result = response.json()["_embedded"]["city:search-results"][0]
paris_link = first_result["_links"]["city:item"]["href"]
paris_link

'https://api.teleport.org/api/cities/geonameid:2988507/'

In [8]:
paris_info = requests.get(paris_link)
paris_info.json()

{'_links': {'city:admin1_division': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/admin1_divisions/geonames:11/',
   'name': 'Île-de-France'},
  'city:alternate-names': {'href': 'https://api.teleport.org/api/cities/geonameid:2988507/alternate_names/'},
  'city:country': {'href': 'https://api.teleport.org/api/countries/iso_alpha2:FR/',
   'name': 'France'},
  'city:timezone': {'href': 'https://api.teleport.org/api/timezones/iana:Europe%2FParis/',
   'name': 'Europe/Paris'},
  'city:urban_area': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/',
   'name': 'Paris'},
  'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templa

In [9]:
paris_scores = requests.get(paris_info.json()["_links"]["city:urban_area"]["href"]+"scores/")
paris_scores

<Response [200]>

In [10]:
paris_scores.json()

{'_links': {'curies': [{'href': 'https://developers.teleport.org/api/resources/Location/#!/relations/{rel}/',
    'name': 'location',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/City/#!/relations/{rel}/',
    'name': 'city',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/UrbanArea/#!/relations/{rel}/',
    'name': 'ua',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Country/#!/relations/{rel}/',
    'name': 'country',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Admin1Division/#!/relations/{rel}/',
    'name': 'a1',
    'templated': True},
   {'href': 'https://developers.teleport.org/api/resources/Timezone/#!/relations/{rel}/',
    'name': 'tz',
    'templated': True}],
  'self': {'href': 'https://api.teleport.org/api/urban_areas/slug:paris/scores/'}},
 'categories': [{'color': '#f3c32c',
   'name': 'Housing',
   'score_out_of_10': 3.5835}

In [11]:
import pandas as pd 
paris_df = pd.DataFrame(paris_scores.json()["categories"])
paris_df

Unnamed: 0,color,name,score_out_of_10
0,#f3c32c,Housing,3.5835
1,#f3d630,Cost of Living,3.664
2,#f4eb33,Startups,9.2765
3,#d2ed31,Venture Capital,7.513
4,#7adc29,Travel Connectivity,10.0
5,#36cc24,Commute,5.3305
6,#19ad51,Business Freedom,8.088333
7,#0d6999,Safety,6.2465
8,#051fa5,Healthcare,8.757
9,#150e78,Education,7.085


In [12]:
paris_df['City']='Paris'

In [13]:
paris_df

Unnamed: 0,color,name,score_out_of_10,City
0,#f3c32c,Housing,3.5835,Paris
1,#f3d630,Cost of Living,3.664,Paris
2,#f4eb33,Startups,9.2765,Paris
3,#d2ed31,Venture Capital,7.513,Paris
4,#7adc29,Travel Connectivity,10.0,Paris
5,#36cc24,Commute,5.3305,Paris
6,#19ad51,Business Freedom,8.088333,Paris
7,#0d6999,Safety,6.2465,Paris
8,#051fa5,Healthcare,8.757,Paris
9,#150e78,Education,7.085,Paris


* We now need to upload this DataFrame to S3 using [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html)

In [14]:
!pip install boto3



In [15]:
# create generic session to my AWS

import boto3

ACCESS_KEY = "**************************"
SECRET_KEY = "******************************************"

session = boto3.Session(aws_access_key_id=ACCESS_KEY, 
                        aws_secret_access_key=SECRET_KEY)

In [16]:
# access S3 as a resource
s3 = session.resource("s3")

In [17]:


BUCKET_NAME = "scoring-cities-in-the-world-had"

my_bucket =s3.create_bucket(Bucket=BUCKET_NAME)
print("Done------")

Done------


* Use `Pandas` to export your DataFrame as a csv file

In [18]:
csv = paris_df.to_csv()

* Use `put_object()` function to create an Object within the bucket you just created 

In [19]:
put_object = my_bucket.put_object(Key="Paris-scoring.csv", Body=csv)

## Get Data For Several Cities 
 

* To have a first list, we are going to scrap [this Wikipedia page](https://en.wikipedia.org/wiki/List_of_largest_cities) where there's a list of the world's largest cities.
We will enrich this list with some other cities of interest.

In [20]:
!pip install Scrapy



In [21]:
import os
import logging

import scrapy
from scrapy.crawler import CrawlerProcess

In [22]:
class CityNamesSpider(scrapy.Spider):

    name = "citynames"

    start_urls = [
        'https://en.wikipedia.org/wiki/List_of_largest_cities',
    ]

    def parse(self, response):
        cities = response.css("tr")
        for city in cities[6:87]:
            yield {
                'city': city.css('a::text').get()
            }

In [23]:
filename = "cities.json"

if filename in os.listdir('Scrapping_results/'):
        os.remove('Scrapping_results/' + filename)

process = CrawlerProcess(settings = {
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
    'LOG_LEVEL': logging.INFO,
    "FEEDS": {
        'Scrapping_results/' + filename : {"format": "json"},
    }
})

process.crawl(CityNamesSpider)
process.start()

2021-09-09 13:34:04 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: scrapybot)
2021-09-09 13:34:04 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 19:08:05) - [GCC 7.5.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1h  22 Sep 2020), cryptography 3.1.1, Platform Linux-5.4.109+-x86_64-with-glibc2.10
2021-09-09 13:34:04 [scrapy.crawler] INFO: Overridden settings:
{'LOG_LEVEL': 20,
 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2021-09-09 13:34:04 [scrapy.extensions.telnet] INFO: Telnet Password: b4cfaec9d86c056b
2021-09-09 13:34:04 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2021-09-09 13:34:04 [scrapy.middleware] INFO: Enab

In [24]:
city_names = pd.read_json("Scrapping_results/cities.json")
city_names.head()

2021-09-09 13:34:04 [numexpr.utils] INFO: NumExpr defaulting to 4 threads.


Unnamed: 0,city
0,
1,Tokyo
2,Delhi
3,Seoul
4,Shanghai


In [25]:
city_names = city_names.loc[1:,:]
city_names.head()

Unnamed: 0,city
1,Tokyo
2,Delhi
3,Seoul
4,Shanghai
5,São Paulo


In [26]:
add_on = pd.DataFrame({'city' : ['Oslo', 'Stockholm', 'Montreal']})
city_names = city_names.append(add_on)

In [27]:
city_names.shape

(83, 1)

In [28]:
# Loop that will go through each city, search for information and store it to the S3 bucket 
for city in city_names['city']:
    try:
        search_city = requests.get(" https://api.teleport.org/api/cities/?search={}".format(city))
        first_result = search_city.json()["_embedded"]["city:search-results"][0]
        city_link = first_result["_links"]["city:item"]["href"]
        city_info = requests.get(city_link)
        city_scores = requests.get(city_info.json()["_links"]["city:urban_area"]["href"]+"scores/")
        city_df = pd.DataFrame(city_scores.json()["categories"])
        city_df['City']='{}'.format(city)
        csv = city_df.to_csv()
        put_object = my_bucket.put_object(Key="{}-scoring.csv".format(city), Body=csv)
        print("{} done".format(city))
    except:
        print("Couldn't find results for {}".format(city))

Tokyo done
Delhi done
Seoul done
Shanghai done
São Paulo done
Mexico City done
Cairo done
Mumbai done
Beijing done
Couldn't find results for Dhaka
Osaka done
New York done
Couldn't find results for Karachi
Buenos Aires done
Couldn't find results for Chongqing
Istanbul done
Couldn't find results for Kolkata
Manila done
Lagos done
Rio de Janeiro done
Couldn't find results for Tianjin
Couldn't find results for Kinshasa
Guangzhou done
Los Angeles done
Moscow done
Shenzhen done
Couldn't find results for Lahore
Bangalore done
Paris done
Bogotá done
Jakarta done
Chennai done
Lima done
Bangkok done
Couldn't find results for Nagoya
Hyderabad done
London done
Tehran done
Chicago done
Couldn't find results for Chengdu
Couldn't find results for Nanjing
Couldn't find results for Wuhan
Ho Chi Minh City done
Couldn't find results for Luanda
Couldn't find results for Ahmedabad
Kuala Lumpur done
Couldn't find results for Xi'an
Hong Kong done
Dongguan done
Hangzhou done
Foshan done
Couldn't find results

In [29]:
# prints the contents of bucket
for s3_file in s3.Bucket(BUCKET_NAME).objects.all():
    print(s3_file.key) 

Atlanta-scoring.csv
Bangalore-scoring.csv
Bangkok-scoring.csv
Barcelona-scoring.csv
Beijing-scoring.csv
Bogotá-scoring.csv
Buenos Aires-scoring.csv
Cairo-scoring.csv
Chennai-scoring.csv
Chicago-scoring.csv
Dallas-scoring.csv
Dar es Salaam-scoring.csv
Delhi-scoring.csv
Dongguan-scoring.csv
Foshan-scoring.csv
Fukuoka-scoring.csv
Guangzhou-scoring.csv
Hangzhou-scoring.csv
Ho Chi Minh City-scoring.csv
Hong Kong-scoring.csv
Houston-scoring.csv
Hyderabad-scoring.csv
Istanbul-scoring.csv
Jakarta-scoring.csv
Johannesburg-scoring.csv
Kuala Lumpur-scoring.csv
Lagos-scoring.csv
Lima-scoring.csv
London-scoring.csv
Los Angeles-scoring.csv
Madrid-scoring.csv
Manila-scoring.csv
Mexico City-scoring.csv
Miami-scoring.csv
Montreal-scoring.csv
Moscow-scoring.csv
Mumbai-scoring.csv
New York-scoring.csv
Osaka-scoring.csv
Oslo-scoring.csv
Paris-scoring.csv
Philadelphia-scoring.csv
Rio de Janeiro-scoring.csv
Riyadh-scoring.csv
Saint Petersburg-scoring.csv
Santiago-scoring.csv
Seoul-scoring.csv
Shanghai-scori

In [30]:
# To test one
#s3.Bucket('BUCKET_NAME').download_file('OBJECT_NAME', 'FILE_NAME')
#s3_file_key = 'Paris-scoring.csv'
#s3.Bucket(BUCKET_NAME).download_file(s3_file_key, 'Paris-scoring.csv')

In [31]:
df = pd.DataFrame(columns = ['color', 'name' , 'score_out_of_10',  'city'])
df

Unnamed: 0,color,name,score_out_of_10,City


In [32]:
# download all csv files in local repository
for s3_file in s3.Bucket(BUCKET_NAME).objects.all():
    s3.Bucket(BUCKET_NAME).download_file(s3_file.key, s3_file.key)
    df = df.append(pd.read_csv(s3_file.key))

In [33]:
df.drop(['color','Unnamed: 0'], axis = 1, inplace = True)

In [34]:
df.head()

Unnamed: 0,name,score_out_of_10,City
0,Housing,4.9755,Atlanta
1,Cost of Living,5.241,Atlanta
2,Startups,8.835,Atlanta
3,Venture Capital,7.257,Atlanta
4,Travel Connectivity,5.2915,Atlanta


In [36]:
# environnment quality scoring per city
health = df['name'] == 'Environmental Quality'
df_health = df[health]
df_health.groupby(['City']).mean().sort_values(by = 'score_out_of_10', ascending = False).head(10)

Unnamed: 0_level_0,score_out_of_10
City,Unnamed: 1_level_1
Stockholm,8.99175
Oslo,8.42425
Singapore,7.95425
Fukuoka,7.916
Montreal,7.7215
Toronto,7.1775
"Washington, D.C.",6.99375
Chicago,6.8045
Seoul,6.787
Dallas,6.693


In [37]:
# total scoring per city
group_city = df.groupby(['City']).mean()
group_city.sort_values(['score_out_of_10'], ascending = False).head(10)

Unnamed: 0_level_0,score_out_of_10
City,Unnamed: 1_level_1
Singapore,7.145069
London,6.894015
Toronto,6.799284
Montreal,6.700363
New York,6.650098
Tokyo,6.572044
Paris,6.488446
Los Angeles,6.439755
Chicago,6.424485
Hong Kong,6.366725


### To do further analysis, the best is to use a Data Viz tool we just need the data in CSV format for example

In [38]:
# to use for some data viz tool
df.to_csv("dataviz.csv", index=False)