# Subject: Data Science Foundation

## Session 14 - ArcGIS API for Python.

### Exercise 2 -  Descriptive Statistics using a HTML table to Pandas Data Frame to Portal Item

Let us read the Wikipedia article on List of countries by cigarette consumption per capita. 
This is a list of countries by annual per capita consumption of tobacco cigarettes. 
Explore the dataframe (descriptive statistics and correlation) and creates a map. 

https://en.wikipedia.org/wiki/List_of_countries_by_cigarette_consumption_per_capita

In [5]:
import pandas as pd

In [6]:
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_cigarette_consumption_per_capita")[0]

In [7]:
df.head()

Unnamed: 0,0,1,2
0,Ranking,Country/Territory,Number of cigarettes per person aged ≥ 15 per ...
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23


In [8]:
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))

In [9]:
df.head()

Unnamed: 0,Ranking,Country/Territory,Number of cigarettes per person aged ≥ 15 per year[7]
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


In [25]:
df.rename(columns={'Ranking': 'Ranking', 'Country/Territory': 'Country', 'Number of cigarettes per person aged ≥ 15 per year[7]': 'Num of sig'}, inplace=True)

In [26]:
df.head()

Unnamed: 0,Ranking,Country,Num of sig
1,1,Montenegro,4124.53
2,2,Belarus,3831.62
3,3,Lebanon,3023.15
4,4,Macedonia,2732.23
5,5,Russia,2690.33


Lets check the data structure

In [27]:
df.dtypes

0
Ranking       object
Country       object
Num of sig    object
dtype: object

In [29]:
converted_column = pd.to_numeric(df["Num of sig"], errors = 'coerce') # If ‘coerce’, then invalid parsing will be set as NaN.
df['Num of sig'] = converted_column


In [30]:
converted_column = pd.to_numeric(df["Ranking"], errors = 'coerce') # If ‘coerce’, then invalid parsing will be set as NaN.
df['Ranking'] = converted_column


In [31]:
df.dtypes

0
Ranking         int64
Country        object
Num of sig    float64
dtype: object

In [32]:
df.shape 

(182, 3)

In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 182 entries, 1 to 182
Data columns (total 3 columns):
Ranking       182 non-null int64
Country       182 non-null object
Num of sig    182 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 5.7+ KB


Lets find the ranking position of our Country

In [35]:
df.loc[df['Country'] == "Spain"]

Unnamed: 0,Ranking,Country,Num of sig
47,47,Spain,1264.74


In [36]:
df.loc[df['Country'] == "Russia"]

Unnamed: 0,Ranking,Country,Num of sig
5,5,Russia,2690.33


Lets check the descriptive statistics

In [37]:
df.describe()

Unnamed: 0,Ranking,Num of sig
count,182.0,182.0
mean,91.5,818.75544
std,52.683014,757.071004
min,1.0,14.96
25%,46.25,213.755
50%,91.5,569.115
75%,136.75,1265.79
max,182.0,4124.53


> Put your code here

Lets rename the columns to prepare the data for a correlation analysis and also for mapping

> Put your code here

We need the "Number of cigarettes per person aged ≥ 15 per year[7]" column (Nrcigar_ppe) in numeric format. Hence let us convert it and while doing so, convert incorrect values to NaN which stands for Not a Number.

> Put your code here

Repeat for the "Ranking" column

> Put your code here

Lets calculate the correlation

> Put your code here

## Plot as a map

Let us connect to our GIS to geocode this data and present it as a map

In [42]:
from arcgis.gis import GIS
import json
gis = GIS("https://www.arcgis.com", "MariiaShcherbiakBTS", "12345Qwer")

In [43]:
fc = gis.content.import_data(df, {"CountryCode":"Country"})

In [44]:
map1 = gis.map('Spain')

In [45]:
map1

Let us us smart mapping to render the points with varying sizes representing the number of Number of cigarettes per person aged ≥ 15 per year

In [46]:
map1.add_layer(fc, {"renderer":"ClassedSizeRenderer",
               "field_name": "Num of sig"})

> Put your code here

Let us publish this layer as a feature collection item in our GIS

In [47]:
item_properties = {
    "title": "Number of cigarettes per person aged ",
    "tags" : "cigaretts ",
    "snippet": " Number of cigarettes per person aged ",
    "description": "test description",
    "text": json.dumps({"featureCollection": {"layers": [dict(fc.layer)]}}),
    "type": "Feature Collection",
    "typeKeywords": "Data, Feature Collection, Singlelayer",
    "extent" : "-102.5272,-41.7886,172.5967,64.984"
}

item = gis.content.add(item_properties)

Let us search for this item

In [48]:
search_result = gis.content.search("Number of cigarettes per person aged")
search_result[0]