# Background

Sebagai data scientist, tugas utama kita akan selalu bergelut dengan data. Namun, selama ini, apakah kita sadar darimana data kita berasal? bagaimana data kita dikumpulkan? dsb dsb? Untuk menjadi seorang data scientist yang baik, saya rasa pengetahuan dan pengalmaan dalam mengumpulkan data cukup penting. Apakah anda pernah kesulitan dalam mencari data? apakah anda pernah kesulitan dalam melakukan pelabelan data? apakah kalian percaya bahwa data seharusnya gratis dan mudah untuk diakses. It's Data for Democracy!

Untuk membuat data dapat diakses oleh semua orang, pihak penyedia jasa juga harus memberikan batasan guna mengurangi penggunaan berlebihan dan penyalahgunaan data oleh pihak-pihak yang tidak bertanggungjawab. Untuk menanggulangi permasalahn ini, kita bisa memanfaatkan teknologi API (Application Program Interface). Dengan API, kita dapat mengatur bagaimana data kita dapat diakses secara umum.

Dalam Capstone kali ini, kita akan mencoba membuat API menggunakan python + flask agar data kita dapat diakses secara umum. Secara konsep, kita akan membangun aplikasi python menggunakan flask yang dapat mengatur, membaa, dan mengirimkan response terhadap request user.


**Data yang digunakan:** 
- books_c.csv

**Environtments:**
- python 
- pandas
- flask 
- gunicorn


**Goals**
1. Berhasil membuat Flask APP yang berfungsi sebagai API yang memberikan data dalam format JSON
2. Berhasil membuat minimal 2 endpoint statis (atau lebih) dan 1 endpoint dinamis(atau lebih) menggunakan routing
3. Berhasil melakukan deployment Flask APP ke Heroku

*Notes: menggunakan endpoints yang sudah dicontohkan tidak akan dihitung sebagai endpoint hasil kerja capstone*

# Membangun API Python dalam 6 menit 
*Disclaimer: Course ini adalah course singkat untuk memperkenalkan student kedalam dunia **backend**. Akan ada sangat banyak kekurangan dari konsep API yang ada saat ini. Untuk kemudahan, kita hanya akan membahas konsep dasar dari API dan dan mewujudkannya dalam bentuk Flask App

Kita akan mencoba membangun Flask App sebagai API, oleh karena itu jika belum memiliki library `Flask`, silakan install menggunakan `pip install flask`. Berikut adalah beberapa library yang akan kita butuhkan. Cobalah import library tersebut sebelum menginstallnya. 

In [227]:
# !pip install flask
# !pip install pandas
# !pip install requests
# !pip install gunicorn

In [228]:
import flask

In [229]:
import pandas as pd

In [230]:
import requests

In [231]:
import gunicorn

In [232]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.options.display.float_format = '{:,}'.format

In [233]:
books = pd.read_csv('data/books_c.csv',index_col=0)

In [234]:
books.head()

Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling,4.56,0439785960,9780439785969,eng,652,1944099,26249
2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling,4.49,0439358078,9780439358071,eng,870,1996446,27613
3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling,4.47,0439554934,9780439554930,eng,320,5629932,70390
4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.41,0439554896,9780439554893,eng,352,6267,272
5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling,4.55,043965548X,9780439655484,eng,435,2149872,33964


In [235]:
#Get to know the data first!details below!
books.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13714 entries, 1 to 47709
Data columns (total 9 columns):
title                 13714 non-null object
authors               13714 non-null object
average_rating        13714 non-null float64
isbn                  13714 non-null object
isbn13                13714 non-null int64
language_code         13714 non-null object
# num_pages           13714 non-null int64
ratings_count         13714 non-null int64
text_reviews_count    13714 non-null int64
dtypes: float64(1), int64(4), object(4)
memory usage: 1.0+ MB


In [236]:
# The type of data there are 
# - 4 Object
# - 5 Int64
# - 1 Float
books.dtypes

title                  object
authors                object
average_rating        float64
isbn                   object
isbn13                  int64
language_code          object
# num_pages             int64
ratings_count           int64
text_reviews_count      int64
dtype: object

In [237]:
#No Missing Value
books.isna().sum()

title                 0
authors               0
average_rating        0
isbn                  0
isbn13                0
language_code         0
# num_pages           0
ratings_count         0
text_reviews_count    0
dtype: int64

In [238]:
#No Duplicate Value
books.duplicated().sum()

0

In [239]:
#The data frame contains a 13714 coulumn & 10 Row. the shape output are the same as DF
books.shape

(13714, 9)

In [240]:
#The size of the data is 1.0 MB+
books.size

123426

In [243]:
#from original file i want to know who is most mention author in books_c.csv data frame
books['authors'].value_counts()

Agatha Christie                                    69
Stephen King                                       66
Orson Scott Card                                   48
Rumiko Takahashi                                   46
P.G. Wodehouse                                     42
                                                   ..
Hubert Selby Jr.-Darren Aronofsky-Richard Price     1
David Sedaris-Joe Mantello                          1
Dan Brown-Raúl Amundaray                            1
Dale Carnegie-Dorothy Carnegie                      1
Gabriel García Márquez-Remy Gorga Filho             1
Name: authors, Length: 7599, dtype: int64

In [244]:
#Make a condition for most mention author in data frame
author = books[(books.authors == 'Agatha Christie')]

In [245]:
#make another copy to assign a new variable
book_authr = books.copy()

In [246]:
book_authr = author

In [251]:
book_authr['average_rating']=book_authr['average_rating'].round(1)
book_authr

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  book_authr['average_rating']=book_authr['average_rating'].round(1)


Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
16297,Cards on the Table (Hercule Poirot #15),Agatha Christie,3.9,0425205959,9780425205952,eng,324,23561,936
16298,A Murder Is Announced (Miss Marple #5),Agatha Christie,4.0,1579126294,9781579126292,eng,288,31050,1097
16299,And Then There Were None,Agatha Christie,4.2,0312330871,9780312330873,eng,264,558832,16793
16300,Sleeping Murder (Miss Marple #13),Agatha Christie,3.9,0002317850,9780002317856,eng,242,18550,811
16303,The Hollow (Hercule Poirot #26),Agatha Christie,3.8,0007121024,9780007121021,eng,384,13022,606
...,...,...,...,...,...,...,...,...,...
37563,Mörderblumen,Agatha Christie,3.7,3502509131,9783502509134,ger,158,7,0
37574,The Mirror Crack'd,Agatha Christie,3.9,0553350153,9780553350159,eng,212,62,10
37575,Dead Man's Mirror,Agatha Christie,3.8,0553350749,9780553350746,eng,233,30,1
37582,Miss Marple Meets Murder: The Mirror Crack'd /...,Agatha Christie,4.2,2731800097,9780800327101,eng,661,172,9


In [262]:
book_authr['language_code'].value_counts()
book_authr['language_code'] = book_authr['language_code'].astype('category')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  book_authr['language_code'] = book_authr['language_code'].astype('category')


In [298]:
#Frequency for know the how many average_rating for book_authr df
pd.crosstab(
    index=book_authr['average_rating'], 
    columns="count").sort_values('count',ascending =False)

col_0,count
average_rating,Unnamed: 1_level_1
3.8,18
4.0,15
3.9,13
4.2,9
3.7,7
3.6,3
4.3,2
4.1,1
4.4,1


In [272]:
book_authr

Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
16297,Cards on the Table (Hercule Poirot #15),Agatha Christie,3.9,0425205959,9780425205952,eng,324,23561,936
16298,A Murder Is Announced (Miss Marple #5),Agatha Christie,4.0,1579126294,9781579126292,eng,288,31050,1097
16299,And Then There Were None,Agatha Christie,4.2,0312330871,9780312330873,eng,264,558832,16793
16300,Sleeping Murder (Miss Marple #13),Agatha Christie,3.9,0002317850,9780002317856,eng,242,18550,811
16303,The Hollow (Hercule Poirot #26),Agatha Christie,3.8,0007121024,9780007121021,eng,384,13022,606
...,...,...,...,...,...,...,...,...,...
37563,Mörderblumen,Agatha Christie,3.7,3502509131,9783502509134,ger,158,7,0
37574,The Mirror Crack'd,Agatha Christie,3.9,0553350153,9780553350159,eng,212,62,10
37575,Dead Man's Mirror,Agatha Christie,3.8,0553350749,9780553350746,eng,233,30,1
37582,Miss Marple Meets Murder: The Mirror Crack'd /...,Agatha Christie,4.2,2731800097,9780800327101,eng,661,172,9


In [273]:
#doing stack, even though the output is series, we can see glimpse of a df be more orginazed because sort by book id
book_authr.stack()

bookID                    
16297   title                 Cards on the Table (Hercule Poirot  #15)
        authors                                        Agatha Christie
        average_rating                                             3.9
        isbn                                                0425205959
        isbn13                                           9780425205952
                                                ...                   
39954   isbn13                                           9780739460870
        language_code                                              eng
        # num_pages                                                568
        ratings_count                                               43
        text_reviews_count                                           6
Length: 621, dtype: object

In [274]:
#unstack is personally not my fav for this. the output in index is title, not so organized compare to stack.
book_authr.unstack()

                    bookID
title               16297     Cards on the Table (Hercule Poirot  #15)
                    16298      A Murder Is Announced (Miss Marple  #5)
                    16299                     And Then There Were None
                    16300            Sleeping Murder (Miss Marple #13)
                    16303             The Hollow (Hercule Poirot  #26)
                                                ...                   
text_reviews_count  37563                                            0
                    37574                                           10
                    37575                                            1
                    37582                                            9
                    39954                                            6
Length: 621, dtype: object

In [282]:
#Melt, the output are same as book_authr df with 69 rows, the difference is the column is only 2 the variable and value
book_authr.melt(value_vars=['text_reviews_count'])

Unnamed: 0,variable,value
0,text_reviews_count,936
1,text_reviews_count,1097
2,text_reviews_count,16793
3,text_reviews_count,811
4,text_reviews_count,606
...,...,...
64,text_reviews_count,0
65,text_reviews_count,10
66,text_reviews_count,1
67,text_reviews_count,9


In [300]:
language = pd.crosstab(index=book_authr['language_code'], 
            columns='mean', 
            values=book_authr['ratings_count'],
            aggfunc='mean')

In [315]:
language.groupby(['language_code']).mean()#.plot.bar()

col_0,mean
language_code,Unnamed: 1_level_1
en-GB,85.0
en-US,83.0
eng,27179.92307692308
ger,31.5


In [309]:
pd.crosstab(index=book_authr['authors'],
            columns=[book_authr['language_code'], ['authors']],
            values=book_authr['text_reviews_count'],
            aggfunc='median')

language_code,en-GB,en-US,eng,ger
col_1,authors,authors,authors,authors
authors,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Agatha Christie,7.0,4.0,488.0,1.5


In [310]:
book_authr.to_csv('book_author.csv')

In [266]:
#the most commong book languange is english, \
#so i just use condition to get all the information for the book using the english languange
#books['languange_code'].value_counts()
#condition = books[(books.language_code == 'eng')]

In [119]:
#Make a copy, so i can restart again, again and again!
book_eng = books.copy()

In [148]:
book_eng.head()
book_eng['average_rating']=book_eng['average_rating'].round(1)
#book_eng = condition

Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling,4.6,0439785960,9780439785969,eng,652,1944099,26249
2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling,4.5,0439358078,9780439358071,eng,870,1996446,27613
3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling,4.5,0439554934,9780439554930,eng,320,5629932,70390
4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.4,0439554896,9780439554893,eng,352,6267,272
5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling,4.6,043965548X,9780439655484,eng,435,2149872,33964


In [124]:
#Create a second condition for book rating with more than 4.0
condition2 = book_eng[(book_eng.average_rating > 4.0)]

In [126]:
#book_eng = condition2
book_eng

Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling,4.6,0439785960,9780439785969,eng,652,1944099,26249
2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling,4.5,0439358078,9780439358071,eng,870,1996446,27613
3,Harry Potter and the Sorcerer's Stone (Harry P...,J.K. Rowling,4.5,0439554934,9780439554930,eng,320,5629932,70390
4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.4,0439554896,9780439554893,eng,352,6267,272
5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling,4.6,043965548X,9780439655484,eng,435,2149872,33964
...,...,...,...,...,...,...,...,...,...
47644,Lirael: Daughter of the Clayr (The Abhorsen Tr...,Garth Nix-Tim Curry,4.3,0807205583,9780807205587,eng,15,23,3
47674,Papa Please Get the Moon for Me,Eric Carle,4.2,0887081770,9780887081774,eng,40,4359,209
47697,The Sandman: King of Dreams,Alisa Kwitney-Neil Gaiman,4.6,0811835928,9780811835923,eng,180,12775,34
47708,The Faeries' Oracle,Brian Froud-Jessica Macbeth,4.4,0743201116,9780743201117,eng,224,1550,38


In [162]:
pd.crosstab(index=book_eng['average_rating'], columns="count")

col_0,count
average_rating,Unnamed: 1_level_1
4.1,1250
4.2,1192
4.3,622
4.4,392
4.5,156
4.6,81
4.7,27
4.8,9
4.9,2
5.0,24


In [265]:
#book_eng['authors'].value_counts()

In [158]:
book_eng[(book_eng.authors == 'P.G. Wodehouse')]

Unnamed: 0_level_0,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
bookID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12550,Stiff Upper Lip Jeeves (Jeeves #13),P.G. Wodehouse,4.3,184159105X,9781841591056,eng,211,6720,296
16374,The Most of P.G. Wodehouse,P.G. Wodehouse,4.5,0743203585,9780743203586,eng,701,1845,103
16376,Full Moon (Blandings Castle #7),P.G. Wodehouse,4.2,1585678368,9781585678365,eng,272,1769,124
16377,How Right You Are Jeeves (Jeeves #12),P.G. Wodehouse,4.2,0743203593,9780743203593,eng,206,2690,148
16379,Life With Jeeves (Jeeves #6 2 & 4),P.G. Wodehouse,4.4,0140059024,9780140059021,eng,557,3143,140
16384,Spring Fever,P.G. Wodehouse,4.1,158567575X,9781585675753,eng,276,468,28
16385,Lord Emsworth Acts for the Best,P.G. Wodehouse,4.1,0141185740,9780141185743,eng,182,187,11
16387,Carry on Jeeves (Jeeves #3),P.G. Wodehouse,4.3,1585673927,9781585673926,eng,273,14989,725
16388,Lord Emsworth and Others (Blandings Castle #5.5),P.G. Wodehouse,4.2,1585672777,9781585672776,eng,268,1057,71
16390,The Mating Season (Jeeves #9),P.G. Wodehouse,4.3,1585672319,9781585672318,eng,272,4394,266


- Create a third condition and new var with P.G.Wodehouse because the author have most mention in a df with the rating above 4.0

In [164]:
PGwodehouse= book_eng[(book_eng.authors == 'P.G. Wodehouse')]

In [205]:
PGwodehouse.to_csv('PGwodehouse.csv')

In [216]:
import requests
import pandas as pd

In [221]:
url1 = 'http://127.0.0.1:5000/data/get/%3CPGwodehouse%3E'
r = requests.get(url1)
r_pd = pd.DataFrame(r.json())

In [222]:
r_pd.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,12550,Stiff Upper Lip Jeeves (Jeeves #13),P.G. Wodehouse,4.3,184159105X,9781841591056,eng,211,6720,296
1,16374,The Most of P.G. Wodehouse,P.G. Wodehouse,4.5,0743203585,9780743203586,eng,701,1845,103
2,16376,Full Moon (Blandings Castle #7),P.G. Wodehouse,4.2,1585678368,9781585678365,eng,272,1769,124
3,16377,How Right You Are Jeeves (Jeeves #12),P.G. Wodehouse,4.2,0743203593,9780743203593,eng,206,2690,148
4,16379,Life With Jeeves (Jeeves #6 2 & 4),P.G. Wodehouse,4.4,0140059024,9780140059021,eng,557,3143,140


In [223]:
url2 = 'http://127.0.0.1:5000/data/get/equal/%3CPGwodehouse%3E/%3Cisbn%3E/184159105X'
r = requests.get(url2)
r_pd = pd.DataFrame(r.json())
r_pd.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,12550,Stiff Upper Lip Jeeves (Jeeves #13),P.G. Wodehouse,4.3,184159105X,9781841591056,eng,211,6720,296


In [326]:
url3 = 'http://127.0.0.1:5000/Agatha_Christie'
r = requests.get(url3)
r_pd = pd.DataFrame(r.json())

In [327]:
r_pd.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
0,16297,Cards on the Table (Hercule Poirot #15),Agatha Christie,3.9,425205959,9780425205952,eng,324,23561,936
1,16298,A Murder Is Announced (Miss Marple #5),Agatha Christie,4.0,1579126294,9781579126292,eng,288,31050,1097
2,16299,And Then There Were None,Agatha Christie,4.2,312330871,9780312330873,eng,264,558832,16793
3,16300,Sleeping Murder (Miss Marple #13),Agatha Christie,3.9,2317850,9780002317856,eng,242,18550,811
4,16303,The Hollow (Hercule Poirot #26),Agatha Christie,3.8,7121024,9780007121021,eng,384,13022,606


In [328]:
url4 = 'http://127.0.0.1:5000/data/get/%3CAgatha_Christie%3E/%3Cisbn%3E/0312330871'
r = requests.get(url4)
r_pd = pd.DataFrame(r.json())
r_pd.head()

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,# num_pages,ratings_count,text_reviews_count
2,16299,And Then There Were None,Agatha Christie,4.2,312330871,9780312330873,eng,264,558832,16793


# Test Your API Endpoints

Setelah deployment app kita behasil, kita bisa mencoba mengaksesnya melalui browser, atau jupyter notebook(lebih disarankan). Mari coba beberapa endpoints yang telah kita coba lakukan di lokal. 

Untuk catatan, tidak perlu menuliskan port pada url heroku, karena kita sudah mengaturnya pada file `Procfile`

In [None]:
import requests
import pandas as pd 
heroku_url = 'https://algo-capstone.herokuapp.com/data/get/books_c.csv'
r = requests.get(heroku_url)
r_pd = pd.DataFrame(r.json())
r_pd