## 5. Scrapping for Streamlit App

The end goal of the project is to build a streamlit app for a user to try out the recommender system.  
For the modeling portion, scikit-surprise only required the username, manga title and user rating.  
Hence, those were the only information collected during the initial scrapping.  
However, to improve user experience on the streamlit app, it is better to display other relevant information about the manga title as well.  
These include a cover image, synopsis and the manga title's specific url to MyAnimeList(MAL) where all information can be found.  
Lastly, the manga title's specific MAL ID number is also scrapped. In case more information is needed, it can be fetched using the Jikan API by calling the specific ID number.

In [1]:
import requests
import pandas as pd
from tqdm import tqdm
from jikanpy import Jikan
import time

In [2]:
jikan = Jikan()

In [4]:
reading_no_zero_df = pd.read_csv('../data/reading_no_zero.csv')

In [12]:
title_list = reading_no_zero_df['item'].unique().tolist()
#title_list = ['Ayashimon','Berserk']
all_info = []
for title in tqdm(title_list):
    try:
        title_info = []
        dict = {}
        dict = jikan.search('manga', title)
        #search title
        title_info.append(title)
        #result title
        title_info.append(dict['results'][0]['title'])
        title_info.append(dict['results'][0]['mal_id'])
        time.sleep(1)
        title_info.append(dict['results'][0]['url'])
        title_info.append(dict['results'][0]['image_url'])
        title_info.append(dict['results'][0]['synopsis'])
        all_info.append(title_info)
        time.sleep(1)
        all_info_df = pd.DataFrame(all_info)
        all_info_df.to_csv('../data/all_info_final.csv', index = False)
    except:
        continue

title_info

100%|███████████████████████████████████████████████████████████████████████████| 2907/2907 [18:16:42<00:00, 22.64s/it]


['Akaneiro ni Somaru Saka',
 'Akaneiro ni Somaru Saka',
 8897,
 'https://myanimelist.net/manga/8897/Akaneiro_ni_Somaru_Saka',
 'https://cdn.myanimelist.net/images/manga/1/111947.jpg?s=280debd0a4af45420cba296285f30288',
 "Jun'ichi Nagase attends a prestigious high school. He has the nickname Geno Killer since he was rebellious in middle school. This is used, inadvertently, to help a girl named Yuuhi Katagiri from troub..."]

In [34]:
all_info_df.rename(columns = {'0':'search title','1':'result title','2':'mal_id','3':'url','4':'image','5':'synopsis'}, inplace = True)
all_info_df.head()

Unnamed: 0,search title,result title,mal_id,url,image,synopsis
0,20th Century Boys,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people..."
1,Akumetsu,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh..."
2,Ayashimon,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd..."
3,Dandadan,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ..."
4,Jigokuraku,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill..."


In [64]:
all_info_df.to_csv('../data/all_info_final_2.csv', index = False)