# Web Scraping using Python and BeautifulSoup

This code implements a web scraper to extract information about anime series from the website "findanime.net". The following information is extracted for each anime:

- Title
- Short title
- URL to the anime page on the website
- Image source
- Anime tags
- Translation status
- Anime description
- Count of episodes
- Rating

The extracted information is stored in a CSV file named "anime.csv".

### Importing Required Libraries

The code starts by importing the required libraries. The following libraries are used:

tool | destiny
--- | ---
`Python 3` | Programing language
`csv` | For writing the scraped data to a CSV file.
`requests` | To send HTTP requests to the website.
`BeautifulSoup` | For parsing the HTML content of the website.
`regex` | For compiling regular expressions.
`requests_html` | To create an HTML session object for sending HTTP requests.
`numpy` | For converting a list to a numpy array.

### Sending HTTP Requests
An HTML session object is created using the HTMLSession class from the requests_html library. This object is used to send HTTP requests to the website. The URL of the webpage to


In [410]:
# Importing required libraries 
import csv
import requests
from bs4 import BeautifulSoup
import regex as re
from requests_html import HTMLSession
import numpy as np

### Creating an HTML Session Object
A session object is created using HTMLSession to store the headers and cookies that will be used in the HTTP requests.

In [411]:
# Create an HTML session object
session = HTMLSession()

### Defining the URL and Headers
The URL of the FindAnime.net webpage to scrape is defined, and headers to include in the HTTP request are also defined.

In [412]:
# URL of the webpage to scrape
url = "https://findanime.net/list?sortType=USER_RATING&offset="

# headers to include in the request
headers = {
    "Accept": "*/*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 YaBrowser/23.1.1.1138 Yowser/2.5 Safari/537.36"
}

### Creating the Anime List
A list called anime_list is created to store the extracted anime information.

In [413]:
# Create a list to store the anime information
anime_list = []

### Sending GET Requests and Parsing the HTML

A for loop is used to send GET requests to the website for each page of the anime list (70 anime per page). The range function is used to specify the number of pages to scrape (in this case, 80 pages).

The `session.get` function is used to send the GET request to the website, including the URL and headers. The response object is then used to access the HTML content of the page.

The `BeautifulSoup` function is used to parse the HTML content into a soup object that can be easily searched and manipulated.

The main container for the list of anime is found using the `soup.find` function and a search for a `div` element with the class `tiles row`.

In [None]:
for i in range(1, 80):
    # Send a GET request to the website using the session and headers
    response = session.get(url+str(i*70), headers=headers)
    html_content = response.content

    # Use BeautifulSoup to parse the HTML content
    soup = BeautifulSoup(html_content, "html.parser")

    # Find the main container for the list of anime
    anime_container = soup.find("div", {"class": "tiles row"})

    # Find all the anime items in the container
    anime_items = anime_container.find_all(class_=re.compile(r'tile col-sm-6 *'))
    
    # Loop through each anime item and extract the necessary information
    for anime_item in anime_items:
        anime_title = anime_item.find('img')['title']
        anime_short_title = anime_item.find("div", {"class": "desc"}).find("a")['title']
        anime_href = 'https://findanime.net' + anime_item.find('a', {"class": "non-hover"})['href']
        anime_image = anime_item.find('a', {"class": "non-hover"}).find('img')['data-original']
        anime_tags = anime_item.find_all("span", {"class": "badge badge-light"})
        anime_tags = [tag.text for tag in anime_tags]
        anime_translation = anime_item.find("span", {"class": "mangaTranslationCompleted"})
        anime_description = anime_item.find("div", {"class": "manga-description"}).text[2:]
        anime_count_ep = anime_item.find("span", {"class": "badge badge-secondary amount-badge"})
        if anime_count_ep:
            anime_count_ep = int(anime_count_ep.text)
        else:
            anime_count_ep = 0
        if anime_translation:
            anime_translation = anime_translation.text
        else:
            anime_translation = 'нет перевода'
        anime_rating = anime_item.find('b', {"class": "rate-value"})
        anime_rating = float(anime_rating.text[10] + anime_rating.text[28:30])

        # Add the extracted information to the anime list
        anime_list.append([anime_title,
                           anime_short_title,
                           anime_href,
                           anime_image,
                           anime_tags,
                           anime_translation,
                           anime_description,
                           anime_count_ep,
                           anime_rating])

In [None]:
# Write the anime list to a CSV file
anime_list = np.asarray(anime_list, dtype=object)

### Writing anime list to CSV File
We create a new CSV file named anime.csv using the open function. The file is opened in write mode ("w") and the encoding is set to "utf-8" for compatibility with a wide range of characters.
### Adding Headers
Next, we use the writerow method of the csv.writer object to add headers to the first line of the CSV file. The headers are a list of strings representing the different columns in the file.
### Adding Data
Finally, we use the writerows method to write the data in the anime_list array to the CSV file. The anime_list array is passed as an argument to writerows and each row in the array is written to a separate line in the CSV file.

In [None]:
# Write as a CSV file with headers on first line
with open("anime.csv", "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)
    # Add Headers
    writer.writerow(['Title',
                     'Short title',
                     'Href',
                     'Image src',
                     'Tags',
                     'Translation',
                     'Description',
                     'Count episodes',
                     'Rate int',
                     'Rate fraction'])
    # Add Data
    writer.writerows(anime_list)

In [414]:
# Import the pandas library
import pandas as pd

In [415]:
# Read the CSV file into a pandas DataFrame
df = pd.read_csv("anime.csv",  encoding="utf-8")

In [416]:
# First five rows
df.head(5)

Unnamed: 0,Title,Short title,Href,Image src,Tags,Translation,Description,Count episodes,Rate int,Rate fraction
0,Семья шпиона (Spy x Family),Семья шпиона,https://findanime.net/semia_shpiona__A5678,https://static.findanime.net/uploads/pics/01/3...,"['детектив', 'повседневность', 'сёнэн', 'драма...",переведено,На редкость забавная история об иде...,12,4,0.8
1,Магистр дьявольского культа 3 (The Founder of ...,Магистр дьявольского культа 3,https://findanime.net/magistr_diavolskogo_kulta_3,https://static.findanime.net/uploads/pics/01/2...,"['исторический', 'драма', 'мистика', 'романтик...",переведено,Продолжение аниме по новелле Мосян ...,12,4,0.8
2,Крутой учитель Онидзука (Great Teacher Onizuka...,Крутой учитель Онидзука,https://findanime.net/great_teacher_onizuka,https://static.findanime.net/uploads/pics/00/1...,"['комедия', 'повседневность', 'сёнэн', 'драма'...",переведено,Главный персонаж «GTO» — молодой па...,43,4,0.8
3,Форма голоса (The Shape of Voice: Koe no Katachi),Форма голоса,https://findanime.net/forma_golosa__A206e69,https://static.findanime.net/uploads/pics/00/5...,"['психология', 'сёнэн', 'школа', 'драма', 'ром...",переведено,Адаптация одноименной манги. Истори...,1,4,0.7
4,Синий экзорцист [Фильм] (Blue Exorcist The Mov...,Синий экзорцист [Фильм],https://findanime.net/sinii_ekzorcist__film_,https://static.findanime.net/uploads/pics/00/2...,"['сёнэн', 'школа', 'драма', 'мистика', 'комеди...",переведено,Каждые одиннадцать лет Академия Ист...,1,4,0.8


In [417]:
# Dimention of dataset
df.shape

(5530, 10)