## Creare un dataframe da un dict

In [3]:
import pandas as pd

data = {
    "Name": ['Alice', 'Bob', 'Charlie'],
    "Age": [25,30,35],
    "City": ["New York", "Los Angelse", "Chicago"]
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angelse
2,Charlie,35,Chicago


In [18]:
oldest = df["Age"]

In [19]:
oldest

0    25
1    30
2    35
Name: Age, dtype: int64

In [20]:
oldest.idxmax()

2

In [21]:
oldest.idxmin()

0

In [24]:
oldest_idx = df["Age"].idxmax()

In [25]:
oldest_record = df.loc[oldest_idx]
oldest_record

Name    Charlie
Age          35
City    Chicago
Name: 2, dtype: object

In [27]:
## Creare un dataframe da un lista
data2 = [
    ["Alice", 25, "New York"],
    ["Bob", 30, "Los Angeles"],
    ["Charlie", 35, "Chicago"],
]

df2 = pd.DataFrame(data2, columns=['Name', 'Age', 'City'])
df2

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


# API

## Importare da file json con richiesta GET

Io in next/javascprit usavo:
```
fetch('https://api.example.com/data', {
    method: 'GET',
    headers: {
      'Content-Type': 'application/json', 
      'Authorization': 'Bearer <token>',  
      'Accept': 'application/json',   
      'Custom-Header': 'value'             
    }
  })
  .then(response => response.json())
  .then(data => console.log(data))
  .catch(error => console.error('Error:', error));
```

Invece in Python usiamo la libreria `requests` che fa praticamente lo stesso:

In [6]:
import requests

url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/json/"

response = requests.get(url)

data = response.json()

df3 = pd.DataFrame(data['records'])

In [7]:
df3.head()

Unnamed: 0,dateRep,day,month,year,cases,deaths,countriesAndTerritories,geoId,countryterritoryCode,popData2019,continentExp,Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0,14/12/2020,14,12,2020,746,6,Afghanistan,AF,AFG,38041757.0,Asia,9.01377925
1,13/12/2020,13,12,2020,298,9,Afghanistan,AF,AFG,38041757.0,Asia,7.05277624
2,12/12/2020,12,12,2020,113,11,Afghanistan,AF,AFG,38041757.0,Asia,6.86876792
3,11/12/2020,11,12,2020,63,10,Afghanistan,AF,AFG,38041757.0,Asia,7.13426564
4,10/12/2020,10,12,2020,202,16,Afghanistan,AF,AFG,38041757.0,Asia,6.96865815


## Web scraping
Il **web scraping** è una tecnica utilizzata per estrarre dati da siti web. Si tratta di un processo automatizzato che raccoglie informazioni da una pagina web e le converte in un formato utile (come un file CSV, un database, o un altro formato strutturato).

### BeautifulSoup
è una libreria Python molto utilizzata per parsing HTML e XML, che permette di estrarre dati da pagine web in modo semplice e strutturato. È particolarmente utile nel contesto del web scraping, dove si desidera ottenere informazioni da siti web. La libreria è chiamata così per la sua capacità di rendere il codice HTML “spigoloso” più facile da leggere e manipolare.

## Esercizio Scraping book

In [16]:
import requests
from bs4 import BeautifulSoup

base_url = "https://books.toscrape.com/"

response = requests.get(base_url)

soup = BeautifulSoup(response.text)

links = soup.find_all("a")

link = links[0]

# print(links)

print([repr(obj) for obj in links])

['<a href="index.html">Books to Scrape</a>', '<a href="index.html">Home</a>', '<a href="catalogue/category/books_1/index.html">\n                            \n                                Books\n                            \n                        </a>', '<a href="catalogue/category/books/travel_2/index.html">\n                            \n                                Travel\n                            \n                        </a>', '<a href="catalogue/category/books/mystery_3/index.html">\n                            \n                                Mystery\n                            \n                        </a>', '<a href="catalogue/category/books/historical-fiction_4/index.html">\n                            \n                                Historical Fiction\n                            \n                        </a>', '<a href="catalogue/category/books/sequential-art_5/index.html">\n                            \n                                Sequential Art\n    

In [17]:
import requests
from bs4 import BeautifulSoup

# URL della pagina principale
url = 'https://books.toscrape.com/'

# Effettua una richiesta GET per ottenere il contenuto HTML della pagina
response = requests.get(url)

# Usa BeautifulSoup per fare il parsing del contenuto HTML
soup = BeautifulSoup(response.text, 'html.parser')

# Trova tutti i libri nella lista
books = soup.find('ol', class_='row').find_all('li')

# Estrai informazioni sui libri
for book in books:
    title = book.find('h3').find('a')['title']
    price = book.find('p', class_='price_color').text
    link = book.find('h3').find('a')['href']
    
    print(f'Title: {title}, Price: {price}, Link: {link}')

Title: A Light in the Attic, Price: Â£51.77, Link: catalogue/a-light-in-the-attic_1000/index.html
Title: Tipping the Velvet, Price: Â£53.74, Link: catalogue/tipping-the-velvet_999/index.html
Title: Soumission, Price: Â£50.10, Link: catalogue/soumission_998/index.html
Title: Sharp Objects, Price: Â£47.82, Link: catalogue/sharp-objects_997/index.html
Title: Sapiens: A Brief History of Humankind, Price: Â£54.23, Link: catalogue/sapiens-a-brief-history-of-humankind_996/index.html
Title: The Requiem Red, Price: Â£22.65, Link: catalogue/the-requiem-red_995/index.html
Title: The Dirty Little Secrets of Getting Your Dream Job, Price: Â£33.34, Link: catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html
Title: The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull, Price: Â£17.93, Link: catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html
Title: The Boys in the Boat: Nine Americans an

# Scraping e creazione del dizionario per i primi 20 libri

In [30]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

# Lista per raccogliere i dati dei libri
book_data = []

# Funzione per ottenere i dettagli da una singola pagina del libro
def get_book_details(book_url):
    response = requests.get(book_url)
    response.encoding = 'utf-8'  # Forza la corretta decodifica
    soup = BeautifulSoup(response.text, 'html.parser')
    
    table = soup.find('table', class_='table table-striped')
    
    if table:
        print(f"Table found in {book_url}")
        rows = table.find_all('tr')
        
        book_info = {}
        for row in rows:
            header = row.find('th')
            value = row.find('td')
            if header and value:
                property_name = header.text.strip()
                property_value = value.text.strip()
                book_info[property_name] = property_value
        
        return book_info
    else:
        print(f"No table found in {book_url}")
        return None

# Funzione per ottenere i libri dalla pagina principale
def get_books(url):
    response = requests.get(url)
    response.encoding = 'utf-8'
    soup = BeautifulSoup(response.text, 'html.parser')
    
    books = soup.find('ol', class_='row').find_all('li')

    for book in books:
        title = book.find('h3').find('a')['title']
        price = book.find('p', class_='price_color').text
        link = book.find('h3').find('a')['href']
        
        # Costruisci il link completo
        book_url = 'https://books.toscrape.com/catalogue/' + link.strip('../')
        
        print(f'\nTitle: {title}, Price: {price}, Link: {book_url}')
        
        book_details = get_book_details(book_url)
        if book_details:
            book_details['Title'] = title
            book_details['Price'] = price
            book_details['Link'] = book_url

            # Aggiungi i dati alla lista
            book_data.append(book_details)

# URL della prima pagina
base_url = 'https://books.toscrape.com/catalogue/page-'

# Inizia con la prima pagina
page_num = 1
book_counter = 0

# Ciclo per raccogliere i dati dei primi 10 libri
while book_counter < 10:
    url = f'{base_url}{page_num}.html'
    response = requests.get(url)
    if response.status_code != 200:
        break  # Esce se non ci sono più pagine

    print(f'Getting books from: {url}')
    get_books(url)
    book_counter += 20  # Ogni pagina ha 20 libri

    if book_counter >= 10:  # Ferma dopo aver ottenuto 10 libri
        break

    page_num += 1  # Passa alla pagina successiva

# Creazione del DataFrame
df = pd.DataFrame(book_data)

# Mostra il DataFrame
print(df)

# Salva il DataFrame in un file CSV
df.to_csv('books_data.csv', index=False)

Getting books from: https://books.toscrape.com/catalogue/page-1.html

Title: A Light in the Attic, Price: £51.77, Link: https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html
Table found in https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html

Title: Tipping the Velvet, Price: £53.74, Link: https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html
Table found in https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html

Title: Soumission, Price: £50.10, Link: https://books.toscrape.com/catalogue/soumission_998/index.html
Table found in https://books.toscrape.com/catalogue/soumission_998/index.html

Title: Sharp Objects, Price: £47.82, Link: https://books.toscrape.com/catalogue/sharp-objects_997/index.html
Table found in https://books.toscrape.com/catalogue/sharp-objects_997/index.html

Title: Sapiens: A Brief History of Humankind, Price: £54.23, Link: https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humank

In [31]:
df

Unnamed: 0,UPC,Product Type,Price (excl. tax),Price (incl. tax),Tax,Availability,Number of reviews,Title,Price,Link
0,a897fe39b1053632,Books,£51.77,£51.77,£0.00,In stock (22 available),0,A Light in the Attic,£51.77,https://books.toscrape.com/catalogue/a-light-i...
1,90fa61229261140a,Books,£53.74,£53.74,£0.00,In stock (20 available),0,Tipping the Velvet,£53.74,https://books.toscrape.com/catalogue/tipping-t...
2,6957f44c3847a760,Books,£50.10,£50.10,£0.00,In stock (20 available),0,Soumission,£50.10,https://books.toscrape.com/catalogue/soumissio...
3,e00eb4fd7b871a48,Books,£47.82,£47.82,£0.00,In stock (20 available),0,Sharp Objects,£47.82,https://books.toscrape.com/catalogue/sharp-obj...
4,4165285e1663650f,Books,£54.23,£54.23,£0.00,In stock (20 available),0,Sapiens: A Brief History of Humankind,£54.23,https://books.toscrape.com/catalogue/sapiens-a...
5,f77dbf2323deb740,Books,£22.65,£22.65,£0.00,In stock (19 available),0,The Requiem Red,£22.65,https://books.toscrape.com/catalogue/the-requi...
6,2597b5a345f45e1b,Books,£33.34,£33.34,£0.00,In stock (19 available),0,The Dirty Little Secrets of Getting Your Dream...,£33.34,https://books.toscrape.com/catalogue/the-dirty...
7,e72a5dfc7e9267b2,Books,£17.93,£17.93,£0.00,In stock (19 available),0,The Coming Woman: A Novel Based on the Life of...,£17.93,https://books.toscrape.com/catalogue/the-comin...
8,e10e1e165dc8be4a,Books,£22.60,£22.60,£0.00,In stock (19 available),0,The Boys in the Boat: Nine Americans and Their...,£22.60,https://books.toscrape.com/catalogue/the-boys-...
9,1dfe412b8ac00530,Books,£52.15,£52.15,£0.00,In stock (19 available),0,The Black Maria,£52.15,https://books.toscrape.com/catalogue/the-black...
