## First, lets get the data for Cairo, Egypt in a cleaned Dataframe

### Data could be found from the following link: https://www.citypopulation.de/php/egypt-greatercairo.php which is provided by citypopulation.de.
This data is not very accurate and some entries are different from Egyptian govornmental websites, however they are much easier to extract and are still reliable to the most part.

### Lets import BeautifulSoup and all its required libraries to read data form a webpage.
The required libraries are beautifulsoup4, lxml and requests.

In [13]:
pip install beautifulsoup4


The following command must be run outside of the IPython shell:

    $ pip install beautifulsoup4

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


In [8]:
pip install lxml


The following command must be run outside of the IPython shell:

    $ pip install lxml

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


In [9]:
pip install requests


The following command must be run outside of the IPython shell:

    $ pip install requests

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more information on how to install packages:

    https://docs.python.org/3/installing/


In [17]:
import requests
import lxml
from bs4 import BeautifulSoup

### Lets start working.
Use BeautifulSoup to read the page's HTML and get the HTML of the table that contains all the information we need.

In [18]:
source = requests.get('https://www.citypopulation.de/php/egypt-greatercairo.php').text
soup = BeautifulSoup(source, 'lxml')

In [19]:
tableHTML = soup.find('table', class_ = 'data')


Each row in the table is in a 'tr' with class 'rname'.
Extract from each row the name of the neighbourhood, its area in sqKm, its population and the link that forwards you to the areas specific webpage.

In [20]:
neghbourhoods = []
pop = []
area = []
links = []

for row in tableHTML.find_all('tr', class_ = 'rname'):
    neghbourhoods.append(row.td.text)
    pop.append(row.find(class_ = 'rpop prio1').text)
    area.append(str(row.find('td')).split("\"")[3])
    links.append(str(row.find(class_ = 'sc').a).split('\"')[1])

Each link can be opened on its own in a BeautifulSoup variable and latitude and longitude of the neighbourhood can be extracted

In [21]:
lat = []
lng = []
for link in links:
    url = 'https://www.citypopulation.de' + link
    source = requests.get(url).text
    soup = BeautifulSoup(source, 'lxml')
    x = soup.find('div', id = 'admmap')
    lat.append(str(x.find_all('meta')[0]).split('\"')[1])
    lng.append(str(x.find_all('meta')[1]).split('\"')[1])
    print(link) #this print statement is to check where the execution has reached while it is being executed

/en/egypt/greatercairo/0104__15_māyū/
/en/egypt/greatercairo/0116__ābidīn/
/en/egypt/greatercairo/0114__ad_darb_al_aḥmar/
/en/egypt/greatercairo/2103__ad_duqqī/
/en/egypt/greatercairo/0134__ain_schams/
/en/egypt/greatercairo/2108__al_ahrām/
/en/egypt/greatercairo/2102__al_ajūzah/
/en/egypt/greatercairo/0131__al_amīriīah/
/en/egypt/greatercairo/0120__al_azbakiyah/
/en/egypt/greatercairo/2111__al_badrashayn/
/en/egypt/greatercairo/0107__al_basātīn/
/en/egypt/greatercairo/2110__al_ḥawāmidiyah/
/en/egypt/greatercairo/0122__al_jamāliyah/
/en/egypt/greatercairo/2104__al_jīzah/
/en/egypt/greatercairo/2109__al_jīzah/
/en/egypt/greatercairo/0111__al_khalīfah/
/en/egypt/greatercairo/1414__al_khānkah/
/en/egypt/greatercairo/1415__al_khānkah/
/en/egypt/greatercairo/1412__al_khuṣūṣ/
/en/egypt/greatercairo/0106__al_maādī/
/en/egypt/greatercairo/0135__al_marj/
/en/egypt/greatercairo/0103__al_maṣarah/
/en/egypt/greatercairo/0133__al_maṭariyah/
/en/egypt/greatercairo/0112__al_muqaṭṭam/
/en/egypt/greate

### Import pandas and make a DF of all the lists we've created. Add population Density to df.

In [22]:
import pandas as pd

In [23]:
neighbourhoodsDF = pd.DataFrame(list(zip(neghbourhoods, pop, area, links, lat, lng)), columns = ['Name', 'Population', 'Area/ sqkm', 'Link', 'Latitude', 'Longitude'] )
neighbourhoodsDF["Population"].replace(regex=True,inplace=True,to_replace=',',value=r'')
neighbourhoodsDF["Population"] = neighbourhoodsDF["Population"].astype(float)
neighbourhoodsDF["Area/ sqkm"] = neighbourhoodsDF["Area/ sqkm"].astype(float)
neighbourhoodsDF["Latitude"] = neighbourhoodsDF["Latitude"].astype(float)
neighbourhoodsDF["Longitude"] = neighbourhoodsDF["Longitude"].astype(float)
neighbourhoodsDF["Population Density"] = neighbourhoodsDF["Population"]/neighbourhoodsDF["Area/ sqkm"]
print(neighbourhoodsDF.shape)
neighbourhoodsDF.head(10)

(74, 7)


Unnamed: 0,Name,Population,Area/ sqkm,Link,Latitude,Longitude,Population Density
0,15 Māyū [15th of May City],96522.0,75.99,/en/egypt/greatercairo/0104__15_māyū/,29.833,31.384,1270.193447
1,'Ābidīn,41605.0,1.72,/en/egypt/greatercairo/0116__ābidīn/,30.044,31.243,24188.953488
2,Ad-Darb al-Aḥmar,60336.0,1.87,/en/egypt/greatercairo/0114__ad_darb_al_aḥmar/,30.041,31.258,32265.240642
3,Ad-Duqqī,73309.0,5.46,/en/egypt/greatercairo/2103__ad_duqqī/,30.039,31.205,13426.556777
4,'Ain Schams,633798.0,8.32,/en/egypt/greatercairo/0134__ain_schams/,30.122,31.327,76177.644231
5,Al-Ahrām,681383.0,17.96,/en/egypt/greatercairo/2108__al_ahrām/,29.994,31.138,37938.919822
6,Al-'Ajūzah,287818.0,7.35,/en/egypt/greatercairo/2102__al_ajūzah/,30.062,31.199,39158.911565
7,Al-Amīriīah,157378.0,3.8,/en/egypt/greatercairo/0131__al_amīriīah/,30.106,31.293,41415.263158
8,Al-Azbakiyah [Azbakeya],20393.0,1.33,/en/egypt/greatercairo/0120__al_azbakiyah/,30.057,31.245,15333.082707
9,Al-Badrashayn [Badrshein],563294.0,135.1,/en/egypt/greatercairo/2111__al_badrashayn/,29.823,31.258,4169.45966


In [24]:
neighbourhoodsDF.dtypes

Name                   object
Population            float64
Area/ sqkm            float64
Link                   object
Latitude              float64
Longitude             float64
Population Density    float64
dtype: object

Greater Cairo has 74 neighbourhoods

## Let's visualize what we have done this far using folium.

In [25]:
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

In [30]:
# Look for the lat and lng of Cairo
latitude = 30.015
longitude = 31.313

In [31]:
# create map of Cairo using latitude and longitude values
map_cairo = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, pop, name, area, density in zip(neighbourhoodsDF['Latitude'], neighbourhoodsDF['Longitude'], neighbourhoodsDF['Population'], neighbourhoodsDF['Name'], neighbourhoodsDF['Area/ sqkm'], 
                                              neighbourhoodsDF['Population Density']):
    label = '{}, Area: {} sqKm, Population: {}, Population Density: {}'.format(name, area, pop, density)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cairo)  
    
map_cairo

### you can also visualize the population density across cairo.
This can be done by building a chloropleth map. However, GeoJSON files are not available for free for egypt (or so i think). Therefore, the cirlce marker size can be changed according to the population density.

In [33]:
# create map of Cairo using latitude and longitude values
map_cairo = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, pop, name, area, density in zip(neighbourhoodsDF['Latitude'], neighbourhoodsDF['Longitude'], neighbourhoodsDF['Population'], neighbourhoodsDF['Name'], neighbourhoodsDF['Area/ sqkm'], 
                                              neighbourhoodsDF['Population Density']):
    label = '{}, Area: {} sqKm, Population: {}, Population Density: {}'.format(name, area, pop, density)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius= density/2000,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cairo)  
    
map_cairo

In the above map, it is clear that population is high through out most of central cairo except for the areas around Az-Zamālik and Qaṣr an-Nīl. Areas faraway from central cairo have much less density.

### Now that we  have the demographics of all the main neighbourhoods in Cairo, it is time to find information about malls as well as something that indicates the income level.

#### Define Foursquare Credentials and Version

In [52]:
CLIENT_ID = 'WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4' # your Foursquare ID
CLIENT_SECRET = 'F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4
CLIENT_SECRET:F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB


#### Let's first get all the banks in Cairo. Banks can show how wealthy a neighbourhood is.

In [48]:
import math #will be used to get the raduis of the search query

In [66]:
#This block outputs Number_Banks, which is a list of number of banks in each neighbourhood. The area of each neighbour considered to be a circle, thats why radius is calculated by math.sqrt(area/3.14)*1000
Number_Banks = []
LIMIT = 50
for lat, lng, area in zip(neighbourhoodsDF['Latitude'], neighbourhoodsDF['Longitude'], neighbourhoodsDF['Area/ sqkm']):
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,  
            lat, 
            lng,
            VERSION,
            'bank',
            math.sqrt(area/3.14)*1000, 
            LIMIT)
    print(url)
    results = requests.get(url).json()
    Number_Banks.append(len(results['response']['venues']))

https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=29.833,31.384&v=20180604&query=bank&radius=4919.4142885789925&limit=50
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.044,31.243&v=20180604&query=bank&radius=740.1153292811483&limit=50
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.041,31.258&v=20180604&query=bank&radius=771.71328955376&limit=50
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.039,31.205&v=20180604&query=bank&radius=1318.6559457207604&limit=50
https://api.foursquare.com/v2/venues/search?

In [70]:
neighbourhoodsDF['Number of Banks'] = Number_Banks
neighbourhoodsDF.head()

Unnamed: 0,Name,Population,Area/ sqkm,Link,Latitude,Longitude,Population Density,Number of Banks
0,15 Māyū [15th of May City],96522.0,75.99,/en/egypt/greatercairo/0104__15_māyū/,29.833,31.384,1270.193447,7
1,'Ābidīn,41605.0,1.72,/en/egypt/greatercairo/0116__ābidīn/,30.044,31.243,24188.953488,23
2,Ad-Darb al-Aḥmar,60336.0,1.87,/en/egypt/greatercairo/0114__ad_darb_al_aḥmar/,30.041,31.258,32265.240642,2
3,Ad-Duqqī,73309.0,5.46,/en/egypt/greatercairo/2103__ad_duqqī/,30.039,31.205,13426.556777,45
4,'Ain Schams,633798.0,8.32,/en/egypt/greatercairo/0134__ain_schams/,30.122,31.327,76177.644231,6


### Number of banks on its own does not accurately show how wealthy an area is.
The welthier the area, the more banks that will be present per capita. Thats because banks in welthier areas serve a smaller number than banks in poorer areas. So we will calculate tha Bank Density which is Banks devided by population

In [72]:
neighbourhoodsDF['Banks Density'] = neighbourhoodsDF['Number of Banks']/neighbourhoodsDF['Population']
neighbourhoodsDF.head()

Unnamed: 0,Name,Population,Area/ sqkm,Link,Latitude,Longitude,Population Density,Number of Banks,Banks Density
0,15 Māyū [15th of May City],96522.0,75.99,/en/egypt/greatercairo/0104__15_māyū/,29.833,31.384,1270.193447,7,7.3e-05
1,'Ābidīn,41605.0,1.72,/en/egypt/greatercairo/0116__ābidīn/,30.044,31.243,24188.953488,23,0.000553
2,Ad-Darb al-Aḥmar,60336.0,1.87,/en/egypt/greatercairo/0114__ad_darb_al_aḥmar/,30.041,31.258,32265.240642,2,3.3e-05
3,Ad-Duqqī,73309.0,5.46,/en/egypt/greatercairo/2103__ad_duqqī/,30.039,31.205,13426.556777,45,0.000614
4,'Ain Schams,633798.0,8.32,/en/egypt/greatercairo/0134__ain_schams/,30.122,31.327,76177.644231,6,9e-06


### Now lets get the number of malls near each neighbourhood
Using the same method used to get the banks, but this time the malls are saved in seperate df.

In [75]:
from pandas.io.json import json_normalize # this is used to change json to pandas

In [113]:
#This block outputs Number_Banks, which is a list of number of banks in each neighbourhood. The area of each neighbour considered to be a circle, thats why radius is calculated by math.sqrt(area/3.14)*1000
Number_Malls = []
Malls = pd.DataFrame(columns=['location.formattedAddress', 'location.lat', 'location.lng', 'name', 'Neighbourhood' ])
LIMIT = 50
for lat, lng, area, neighbourhood in zip(neighbourhoodsDF['Latitude'], neighbourhoodsDF['Longitude'], neighbourhoodsDF['Area/ sqkm'], neighbourhoodsDF['Name']):
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,  
            lat, 
            lng,
            VERSION,
            'mall', #some malls are included in shopping centre category
            math.sqrt(area/3.14)*1000, 
            LIMIT)
    print(url)
    results = requests.get(url).json()
    Number_Malls.append(len(results['response']['venues']))
    venues = results['response']['venues']
    dataframe = json_normalize(venues)
    dataframe['Neighbourhood'] = neighbourhood
    #since some places might not have malls, the next few lines are done in a try block
    try:
        FilteredDf = dataframe[['location.formattedAddress', 'location.lat', 'location.lng', 'name', 'Neighbourhood' ]]
        Malls = Malls.append(FilteredDf)
    except:
        print('df empty, area with no malls')

https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=29.833,31.384&v=20180604&query=mall&radius=4919.4142885789925&limit=50
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.044,31.243&v=20180604&query=mall&radius=740.1153292811483&limit=50
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.041,31.258&v=20180604&query=mall&radius=771.71328955376&limit=50
df empty
https://api.foursquare.com/v2/venues/search?client_id=WP53N4OBPVJNQZGWD0HLN53MDK2GIHWVVYGHPU0RKTUA4KO4&client_secret=F4QUBRVSVUXZ3CZAWKS34A2QQ0TLPGBSWNU1Z1W2N3IS3SWB&ll=30.039,31.205&v=20180604&query=mall&radius=1318.6559457207604&limit=50
https://api.foursquare.com/v2/venue

Clean the malls dataframe

In [116]:
# because of the assumption that all neighbourhoods are circular. some duplicates might occur
Malls.drop_duplicates(subset ="name", keep = False, inplace = True)
Malls.rename(columns={"location.formattedAddress": "Address", "location.lat": "Latitude", "location.lng": "Longitude", "name": "Name"}, inplace = True)
Malls.reset_index(inplace = True, drop = True)
print(Malls.shape)
Malls.head()

(212, 5)


Unnamed: 0,Address,Latitude,Longitude,Name,Neighbourhood
0,"[30 Talaat Harb St, وسط البلد, القاهرة, مصر]",30.050162,31.239915,Talaat Harb Mall (مول طلعت حرب),'Ābidīn
1,"[20 Youssef El Gendy St (El Bustan St), باب ال...",30.045942,31.239892,El Nekhely (النخيلي),'Ābidīn
2,"[Gawad Hosni St (Sherif St), عابدين, القاهرة, ...",30.047894,31.24265,El Bustan Cafe (قهوة البستان),'Ābidīn
3,[مصر],30.119197,31.32155,Grand Mall - Elna'am,'Ain Schams
4,"[El Haram St, Al Haram, Muḩāfaz̧at al Jīzah, M...",29.988053,31.145066,Plaza Mall (بلازا مول),Al-Ahrām


In [118]:
# Add the number of malls to the neighbourhoodDF
neighbourhoodsDF['Number of Malls'] = Number_Malls
neighbourhoodsDF.head(10)

Unnamed: 0,Name,Population,Area/ sqkm,Link,Latitude,Longitude,Population Density,Number of Banks,Banks Density,Number of Malls
0,15 Māyū [15th of May City],96522.0,75.99,/en/egypt/greatercairo/0104__15_māyū/,29.833,31.384,1270.193447,7,7.3e-05,1
1,'Ābidīn,41605.0,1.72,/en/egypt/greatercairo/0116__ābidīn/,30.044,31.243,24188.953488,23,0.000553,5
2,Ad-Darb al-Aḥmar,60336.0,1.87,/en/egypt/greatercairo/0114__ad_darb_al_aḥmar/,30.041,31.258,32265.240642,2,3.3e-05,0
3,Ad-Duqqī,73309.0,5.46,/en/egypt/greatercairo/2103__ad_duqqī/,30.039,31.205,13426.556777,45,0.000614,1
4,'Ain Schams,633798.0,8.32,/en/egypt/greatercairo/0134__ain_schams/,30.122,31.327,76177.644231,6,9e-06,2
5,Al-Ahrām,681383.0,17.96,/en/egypt/greatercairo/2108__al_ahrām/,29.994,31.138,37938.919822,12,1.8e-05,5
6,Al-'Ajūzah,287818.0,7.35,/en/egypt/greatercairo/2102__al_ajūzah/,30.062,31.199,39158.911565,46,0.00016,8
7,Al-Amīriīah,157378.0,3.8,/en/egypt/greatercairo/0131__al_amīriīah/,30.106,31.293,41415.263158,1,6e-06,1
8,Al-Azbakiyah [Azbakeya],20393.0,1.33,/en/egypt/greatercairo/0120__al_azbakiyah/,30.057,31.245,15333.082707,14,0.000687,1
9,Al-Badrashayn [Badrshein],563294.0,135.1,/en/egypt/greatercairo/2111__al_badrashayn/,29.823,31.258,4169.45966,2,4e-06,1


### Let's Visualize the distribution of malls in Cairo using Folium

In [119]:
# Look for the lat and lng of Cairo
latitude = 30.015
longitude = 31.313
# create map of Cairo using latitude and longitude values
map_malls = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighbourhood, name in zip(Malls['Latitude'], Malls['Longitude'], Malls['Neighbourhood'], Malls['Name']):
    
    label = '{}, {}'.format(name, neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_malls)  
    
map_malls