# Scraping Infinite-Scroll Pada Anilist

Infinite scrolling is a web-design technique that loads content continuously as the user scrolls down the page, eliminating the need for pagination. The success of infinite scrolling on social media sites such as Twitter
[link](https://www.nngroup.com/articles/infinite-scrolling/)

In [44]:
from selenium import webdriver
import time

In [45]:
# Membuka Chrome Browser yang dikontrol oleh selenium
# Maximize Chrome Browser tersebut
# Bandingkan dengan PhantomJS
driver = webdriver.Chrome()

In [46]:
url = "https://anilist.co/search/anime?year=2020%25&season=WINTER"

In [47]:
driver.get(url)
print(driver.current_url)
driver.implicitly_wait(30)

https://anilist.co/search/anime?year=2020%25&season=WINTER


### Analisa website anilist tersebut

Lakukan scroll down page hingga habis lalu perhatikan apa yang terjadi pada website tersebut? item anime baru akan bermunculan beberapa detik kemudian.

Bagaimana secara otomatis melakukan scroll down dengan selenium?
Gunakan https://selenium-python.readthedocs.io/api.html#selenium.webdriver.remote.webdriver.WebDriver.execute_script untuk mengeksekusi perintah javascript https://developer.mozilla.org/en-US/docs/Web/API/Window/scrollTo

```js
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
```

### Cara Scraping yang biasa dilakukan

1. Lihat struktur website yang akan diambil datanya seluruh info anime ada didalam tag div.media-card
2. Pada website anilist seluruh data tersusun rapi maka kita dapat mengambil cover, data, airing-countdown, extra, descriptions dan genres dengan mengambil classnya

In [48]:
results = driver.find_elements_by_class_name("media-card")

In [49]:
len(results)

20

In [50]:
for anime in results:
    print(anime.text)

Haikyuu!! TO THE TOP
Production I.G
Ep 12 - 4d 11h 26m
TV
84%
The fourth season of Haikyuu!!

The Karasuno High School Volleyball Club finally won their way into the nationals after an intense battle for the Miyagi Prefecture Spring Tournament qualifiers. As they were preparing for the nationals, Kageyama is invited to go to All-Japan Youth Training Camp. At the same time, Tsukishima is invited to go to a special rookie select training camp for first-years in Miyagi Prefecture. Hinata feels panic that he’s being left behind as one of the first-years and then decides to show up at the Miyagi Prefecture rookie select training camp anyway...

(Source: Crunchyroll)
Comedy , Drama , Sports
Eizouken ni wa Te wo Dasu na!
Science SARU
Winter 2020
TV
79%
First year high schooler Midori Asakusa loves anime so much, she insists that "concept is everything" in animation. Though she draws a variety of ideas in her sketchbook, she hasn't taken the first step to creating anime, insisting that she can

In [51]:
winter_anime = []

In [52]:
for anime in results:
    cover = anime.find_element_by_class_name("cover")
    extra = anime.find_element_by_class_name("extra").text.split("\n")
    genres = anime.find_element_by_class_name("genres").text
    c_text = cover.text.split("\n")
    winter_anime.append({
        'cover_title': c_text[0],
        'cover_studio': c_text[1],
        'cover_img': cover.get_attribute('data-src'),
        'format': extra[0],
        'duration': extra[1],
        'description': anime.find_element_by_class_name("description").text,
        'genres': genres.split(',')
    })

In [53]:
winter_anime

[{'cover_title': 'Haikyuu!! TO THE TOP',
  'cover_studio': 'Production I.G',
  'cover_img': 'https://s4.anilist.co/file/anilistcdn/media/anime/cover/large/bx106625-UR22wB2NuNVi.png',
  'format': 'TV',
  'duration': '84%',
  'description': 'The fourth season of Haikyuu!!\n\nThe Karasuno High School Volleyball Club finally won their way into the nationals after an intense battle for the Miyagi Prefecture Spring Tournament qualifiers. As they were preparing for the nationals, Kageyama is invited to go to All-Japan Youth Training Camp. At the same time, Tsukishima is invited to go to a special rookie select training camp for first-years in Miyagi Prefecture. Hinata feels panic that he’s being left behind as one of the first-years and then decides to show up at the Miyagi Prefecture rookie select training camp anyway...\n\n(Source: Crunchyroll)',
  'genres': ['Comedy ', ' Drama ', ' Sports']},
 {'cover_title': 'Eizouken ni wa Te wo Dasu na!',
  'cover_studio': 'Science SARU',
  'cover_img':

### Simpan Kedalam JSON

In [54]:
import json
f=open('ScrapingAnimelist.json','w')

In [55]:
json.dump(winter_anime,f)

In [56]:
f.close()

### Cara Scraping Infinite Scroll (5 Page)

In [57]:
driver.get(url)
print(driver.current_url)
driver.implicitly_wait(30)

https://anilist.co/search/anime?year=2020%25&season=WINTER


In [59]:
for i in range(1, 5):
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
    # beri waktu jeda untuk load item anime
    # perhatikan perubahan di google chrome, halaman akan scroll down otomatis
    time.sleep(20)

In [16]:
results = driver.find_elements_by_class_name("media-card")
len(results)

85

In [17]:
winter_anime = []

In [18]:
for anime in results:
    cover = anime.find_element_by_class_name("cover")
    extra = anime.find_element_by_class_name("extra").text.split("\n")
    genres = anime.find_element_by_class_name("genres").text
    c_text = cover.text.split("\n")
    winter_anime.append({
        'cover_title': c_text[0],
        'cover_studio': "" if len(c_text) < 2 else c_text[1],
        'cover_img': cover.get_attribute('data-src'),
        'format': extra[0],
        'duration': "" if len(extra) < 2 else extra[1],
        'description': anime.find_element_by_class_name("description").text,
        'genres': genres.split(',')
    })

In [19]:
winter_anime

[{'cover_title': 'Haikyuu!! TO THE TOP',
  'cover_studio': 'Production I.G',
  'cover_img': 'https://s4.anilist.co/file/anilistcdn/media/anime/cover/large/bx106625-UR22wB2NuNVi.png',
  'format': 'TV',
  'duration': '84%',
  'description': 'The fourth season of Haikyuu!!\n\nThe Karasuno High School Volleyball Club finally won their way into the nationals after an intense battle for the Miyagi Prefecture Spring Tournament qualifiers. As they were preparing for the nationals, Kageyama is invited to go to All-Japan Youth Training Camp. At the same time, Tsukishima is invited to go to a special rookie select training camp for first-years in Miyagi Prefecture. Hinata feels panic that he’s being left behind as one of the first-years and then decides to show up at the Miyagi Prefecture rookie select training camp anyway...\n\n(Source: Crunchyroll)',
  'genres': ['Comedy ', ' Drama ', ' Sports']},
 {'cover_title': 'Eizouken ni wa Te wo Dasu na!',
  'cover_studio': 'Science SARU',
  'cover_img':

## Simpan Ke JSON

In [20]:
import json
f=open('ScrapingAnimelist2.json','w')

In [21]:
json.dump(winter_anime,f)

In [22]:
f.close()

### Berhenti

In [23]:
driver.quit()

# Tugas: Scraping List Manga

Pada Website yang sama lakukan scraping infinite loop pada manga https://anilist.co/search/manga?year=2020%25



## Scraping Infinite Loop Manga

In [60]:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support import expected_conditions as EC
import json
import time

In [61]:
driver = webdriver.Chrome()

In [62]:
url = "https://anilist.co/search/manga?year=2020%25"

In [63]:
driver.get(url)
print(driver.current_url)
driver.implicitly_wait(30)

https://anilist.co/search/manga?year=2020%25


In [64]:
def scrollToBottom(driver, timeout):
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        driver.execute_script(
            'window.scrollTo(0, document.body.scrollHeight);')

        WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located((By.CLASS_NAME, 'media-card')))

        time.sleep(timeout)

        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break

        last_height = new_height

In [65]:
scrollToBottom(driver,1)

In [66]:
result = driver.find_elements_by_class_name('media-card')

In [67]:
len(result)

230

In [68]:
anime_manga = []

In [69]:
for anime in result:
    cover = anime.find_element_by_class_name("cover")
    extra = anime.find_element_by_class_name("extra").text.split("\n")
    genres = anime.find_element_by_class_name("genres").text
    c_text = cover.text.split("\n")
    anime_manga.append({
        'cover_title': c_text[0],
        'cover_studio': "" if len(c_text) < 2 else c_text[1],
        'cover_img': cover.get_attribute('data-src'),
        'format': extra[0],
        'duration': "" if len(extra) < 2 else extra[1],
        'description': anime.find_element_by_class_name("description").text,
        'genres': genres.split(',')
    })

In [70]:
anime_manga

[{'cover_title': 'Death Note: Tokubetsu Yomikiri',
  'cover_studio': '',
  'cover_img': 'https://s4.anilist.co/file/anilistcdn/media/manga/cover/large/bx115122-HWsbTNeZyoFD.jpg',
  'format': 'One Shot',
  'duration': '74%',
  'description': "The chapter will center on Ryuk's Death Note being brought again to the human world, after the end of the main manga.",
  'genres': ['Mystery ', ' Supernatural']},
 {'cover_title': 'Kakkou no Iinazuke',
  'cover_studio': '',
  'cover_img': 'https://s4.anilist.co/file/anilistcdn/media/manga/cover/large/bx114383-05Uo5j9nIzf8.png',
  'format': 'Manga',
  'duration': '64%',
  'description': "Kakkou no Iinazuke's story revolves around 2 teenagers who got switched soon after their birth. Hoping to keep both of their biological and adopted children, their parents decided to engage the boy and the girl.",
  'genres': ['Comedy ', ' Romance']},
 {'cover_title': 'MASHLE',
  'cover_studio': '',
  'cover_img': 'https://s4.anilist.co/file/anilistcdn/media/manga/

### Simpan Ke JSON

In [74]:
import json
f=open('ScrapingAnimelist3.json','w')

In [75]:
json.dump(anime_manga,f)

In [76]:
f.close()

### Berhenti

In [77]:
driver.quit()