# Introduction to Quality Data & Engineering with Python
## Lecture 6 - Tutorial

If you haven't already installed **PostgreSQL**, download and install it from the official website.

https://www.postgresql.org/

### Task1: Install SQLAlchemy.

In [2]:
!pip install SQLAlchemy



### Task2: Import SQLAlchemy and pandas libraries to your notebook.

In [3]:
from sqlalchemy import create_engine, text
import pandas as pd

### Task3: Define connection parameters to the PostgreSQL database.

These parameters include:

**USERNAME:** Username for accessing the database.

**PASSWORD:** Password for accessing the database.

**HOST:** Hostname or IP address of the database server.

**PORT:** Port number on which the database server is listening.

**DATABASE_NAME:** Name of the database you want to connect to.

In [4]:
USERNAME = 'postgres'
PASSWORD = 'postgres'
HOST = 'localhost'
PORT = 5432
DATABASE_NAME = 'postgres'

In [5]:
database_url = f'postgresql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE_NAME}'
database_url

'postgresql://postgres:postgres@localhost:5432/postgres'

### Task4: Create a database engine.

Set the echo parameter to True if you want SQLAlchemy to log the database interactions.


In [6]:
engine = create_engine(url = database_url, echo=True)

In [7]:
engine

Engine(postgresql://postgres:***@localhost:5432/postgres)

### Task5: Use the connect() method on the PostgreSQL engine to establish a connection to the database

In [8]:
pg_client_connect = engine.connect()

2024-05-24 11:33:59,118 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2024-05-24 11:33:59,119 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-05-24 11:33:59,122 INFO sqlalchemy.engine.Engine select current_schema()
2024-05-24 11:33:59,124 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-05-24 11:33:59,126 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2024-05-24 11:33:59,127 INFO sqlalchemy.engine.Engine [raw sql] {}


In [9]:
pg_client_connect 

<sqlalchemy.engine.base.Connection at 0x1174d09d0>

### Task6: Create authors table using SQL script shared with you below.

In [10]:
authors_table = """ DROP TABLE if EXISTS authors;
CREATE TABLE authors (
    author_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    specialization VARCHAR(100)
);
"""

In [11]:
with engine.begin() as conn:
    conn.execute(text(authors_table))

2024-05-24 11:37:30,986 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:37:30,987 INFO sqlalchemy.engine.Engine  DROP TABLE if EXISTS authors;
CREATE TABLE authors (
    author_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    specialization VARCHAR(100)
);

2024-05-24 11:37:30,988 INFO sqlalchemy.engine.Engine [generated in 0.00083s] {}
2024-05-24 11:37:30,999 INFO sqlalchemy.engine.Engine COMMIT


### Task7: Execute the SQL script provided below to insert sample data into the authors table

In [12]:
authors_insert = """
INSERT INTO authors (name, specialization) VALUES
    ('Richard Feynman', 'Physics'),
    ('Brian Greene', 'Physics'),
    ('Steven Hawking', 'Physics'),
    ('Stephen Wolfram', 'Computer Science'),
    ('Claude Shannon', 'Computer Science'),
    ('John McCarthy', 'Computer Science'),
    ('Ada Lovelace', 'Computer Science'),
    ('Alan Turing', 'Computer Science'),
    ('Marie Curie', 'Chemistry'),
    ('Charles Darwin', 'Biology');
"""

In [13]:
with engine.begin() as conn:
    conn.execute(text(authors_insert))

2024-05-24 11:39:13,008 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:39:13,012 INFO sqlalchemy.engine.Engine 
INSERT INTO authors (name, specialization) VALUES
    ('Richard Feynman', 'Physics'),
    ('Brian Greene', 'Physics'),
    ('Steven Hawking', 'Physics'),
    ('Stephen Wolfram', 'Computer Science'),
    ('Claude Shannon', 'Computer Science'),
    ('John McCarthy', 'Computer Science'),
    ('Ada Lovelace', 'Computer Science'),
    ('Alan Turing', 'Computer Science'),
    ('Marie Curie', 'Chemistry'),
    ('Charles Darwin', 'Biology');

2024-05-24 11:39:13,016 INFO sqlalchemy.engine.Engine [generated in 0.00322s] {}
2024-05-24 11:39:13,022 INFO sqlalchemy.engine.Engine COMMIT


### Task8: Execute a SELECT query to retrieve all data from the authors table

In [20]:
sql_query = "SELECT * FROM authors"

with engine.begin() as conn:
    result = conn.execute(text(sql_query)).fetchall()

2024-05-24 11:43:24,642 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:43:24,644 INFO sqlalchemy.engine.Engine SELECT * FROM authors
2024-05-24 11:43:24,645 INFO sqlalchemy.engine.Engine [cached since 9.738s ago] {}
2024-05-24 11:43:24,648 INFO sqlalchemy.engine.Engine COMMIT


In [21]:
result

[(1, 'Richard Feynman', 'Physics'),
 (2, 'Brian Greene', 'Physics'),
 (3, 'Steven Hawking', 'Physics'),
 (4, 'Stephen Wolfram', 'Computer Science'),
 (5, 'Claude Shannon', 'Computer Science'),
 (6, 'John McCarthy', 'Computer Science'),
 (7, 'Ada Lovelace', 'Computer Science'),
 (8, 'Alan Turing', 'Computer Science'),
 (9, 'Marie Curie', 'Chemistry'),
 (10, 'Charles Darwin', 'Biology')]

### Task9: Fetch data from the authors table and convert it to a DataFrame

In [17]:
sql_query = "SELECT * FROM authors"
df = pd.read_sql(sql_query, pg_client_connect)

2024-05-24 11:42:22,159 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-05-24 11:42:22,166 INFO sqlalchemy.engine.Engine [cached since 58.74s ago] {'table_name': 'SELECT * FROM authors', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-05-24 11:42:22,170 INFO sqlalchemy.engine.Engine SELECT * FROM authors
2024-05-24 11:42:22,171 INFO sqlalchemy.engine.Engine [raw sql] {}


In [18]:
df

Unnamed: 0,author_id,name,specialization
0,1,Richard Feynman,Physics
1,2,Brian Greene,Physics
2,3,Steven Hawking,Physics
3,4,Stephen Wolfram,Computer Science
4,5,Claude Shannon,Computer Science
5,6,John McCarthy,Computer Science
6,7,Ada Lovelace,Computer Science
7,8,Alan Turing,Computer Science
8,9,Marie Curie,Chemistry
9,10,Charles Darwin,Biology


### Task10: Create the books table using the SQL script provided below


In [22]:
books_table = """ DROP TABLE if EXISTS books;
CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    publication_year INTEGER,
    author_id INTEGER REFERENCES authors(author_id)
);
"""

In [23]:
with engine.begin() as conn:
    conn.execute(text(books_table))

2024-05-24 11:44:17,902 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:44:17,906 INFO sqlalchemy.engine.Engine  DROP TABLE if EXISTS books;
CREATE TABLE books (
    book_id SERIAL PRIMARY KEY,
    title VARCHAR(200) NOT NULL,
    publication_year INTEGER,
    author_id INTEGER REFERENCES authors(author_id)
);

2024-05-24 11:44:17,908 INFO sqlalchemy.engine.Engine [generated in 0.00182s] {}
2024-05-24 11:44:17,919 INFO sqlalchemy.engine.Engine COMMIT


### Task11: Execute the SQL script provided below to insert sample data into the books table

In [25]:
books_insert = """
INSERT INTO books (title, publication_year, author_id) VALUES
    ('Surely You are Joking, Mr. Feynman!', 1985, 1),
    ('The Elegant Universe', 1999, 2),
    ('A Brief History of Time', 1988, 3),
    ('A New Kind of Science', 2002, 4),
    ('The Mathematical Theory of Communication', 1948, 5),
    ('Feynman Lectures on Computation', 1996, 1),
    ('Artificial Intelligence: A Modern Approach', 1995, 6),
    ('The Analytical Engine', 1843, 7),
    ('Computing Machinery and Intelligence', 1950, 8),
    ('Radioactive Substances', 1904, 9),
    ('On the Origin of Species', 1859, 10);
"""

In [26]:
with engine.begin() as conn:
    conn.execute(text(books_insert))

2024-05-24 11:44:44,392 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:44:44,395 INFO sqlalchemy.engine.Engine 
INSERT INTO books (title, publication_year, author_id) VALUES
    ('Surely You are Joking, Mr. Feynman!', 1985, 1),
    ('The Elegant Universe', 1999, 2),
    ('A Brief History of Time', 1988, 3),
    ('A New Kind of Science', 2002, 4),
    ('The Mathematical Theory of Communication', 1948, 5),
    ('Feynman Lectures on Computation', 1996, 1),
    ('Artificial Intelligence: A Modern Approach', 1995, 6),
    ('The Analytical Engine', 1843, 7),
    ('Computing Machinery and Intelligence', 1950, 8),
    ('Radioactive Substances', 1904, 9),
    ('On the Origin of Species', 1859, 10);

2024-05-24 11:44:44,400 INFO sqlalchemy.engine.Engine [generated in 0.00500s] {}
2024-05-24 11:44:44,407 INFO sqlalchemy.engine.Engine COMMIT


### Task12: Execute a SELECT query to retrieve all data from the books table

In [27]:
sql_query = "SELECT * FROM books"

with engine.begin() as conn:
    result = conn.execute(text(sql_query)).fetchall()
    
result

2024-05-24 11:45:45,228 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:45:45,230 INFO sqlalchemy.engine.Engine SELECT * FROM books
2024-05-24 11:45:45,231 INFO sqlalchemy.engine.Engine [generated in 0.00130s] {}
2024-05-24 11:45:45,234 INFO sqlalchemy.engine.Engine COMMIT


[(1, 'Surely You are Joking, Mr. Feynman!', 1985, 1),
 (2, 'The Elegant Universe', 1999, 2),
 (3, 'A Brief History of Time', 1988, 3),
 (4, 'A New Kind of Science', 2002, 4),
 (5, 'The Mathematical Theory of Communication', 1948, 5),
 (6, 'Feynman Lectures on Computation', 1996, 1),
 (7, 'Artificial Intelligence: A Modern Approach', 1995, 6),
 (8, 'The Analytical Engine', 1843, 7),
 (9, 'Computing Machinery and Intelligence', 1950, 8),
 (10, 'Radioactive Substances', 1904, 9),
 (11, 'On the Origin of Species', 1859, 10)]

### Task13: Fetch data from the books table and convert it to a DataFrame

In [28]:
sql_query = "SELECT * FROM books"
df = pd.read_sql(sql_query, pg_client_connect)

2024-05-24 11:46:14,881 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-05-24 11:46:14,884 INFO sqlalchemy.engine.Engine [cached since 291.5s ago] {'table_name': 'SELECT * FROM books', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-05-24 11:46:14,889 INFO sqlalchemy.engine.Engine SELECT * FROM books
2024-05-24 11:46:14,890 INFO sqlalchemy.engine.Engine [raw sql] {}


In [29]:
df

Unnamed: 0,book_id,title,publication_year,author_id
0,1,"Surely You are Joking, Mr. Feynman!",1985,1
1,2,The Elegant Universe,1999,2
2,3,A Brief History of Time,1988,3
3,4,A New Kind of Science,2002,4
4,5,The Mathematical Theory of Communication,1948,5
5,6,Feynman Lectures on Computation,1996,1
6,7,Artificial Intelligence: A Modern Approach,1995,6
7,8,The Analytical Engine,1843,7
8,9,Computing Machinery and Intelligence,1950,8
9,10,Radioactive Substances,1904,9


### Task14: Execute an SQL query to join the books and authors tables and convert the result to a DataFrame

In [30]:
sql_query = """

SELECT * FROM books
INNER JOIN authors
ON books.author_id = authors.author_id

"""

df_join = pd.read_sql(sql_query, pg_client_connect)

2024-05-24 11:49:24,250 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-05-24 11:49:24,252 INFO sqlalchemy.engine.Engine [cached since 480.8s ago] {'table_name': '\n\nSELECT * FROM books\nINNER JOIN authors\nON books.author_id = authors.author_id\n\n', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-05-24 11:49:24,259 INFO sqlalchemy.engine.Engine 

SELECT * FROM books
INNER JOIN authors
ON books.author_id = authors.author_id


2024-05-24 11:49:24,260 INFO sqlalchemy.engine.Engine [raw sql] {}


In [31]:
df_join

Unnamed: 0,book_id,title,publication_year,author_id,author_id.1,name,specialization
0,1,"Surely You are Joking, Mr. Feynman!",1985,1,1,Richard Feynman,Physics
1,2,The Elegant Universe,1999,2,2,Brian Greene,Physics
2,3,A Brief History of Time,1988,3,3,Steven Hawking,Physics
3,4,A New Kind of Science,2002,4,4,Stephen Wolfram,Computer Science
4,5,The Mathematical Theory of Communication,1948,5,5,Claude Shannon,Computer Science
5,6,Feynman Lectures on Computation,1996,1,1,Richard Feynman,Physics
6,7,Artificial Intelligence: A Modern Approach,1995,6,6,John McCarthy,Computer Science
7,8,The Analytical Engine,1843,7,7,Ada Lovelace,Computer Science
8,9,Computing Machinery and Intelligence,1950,8,8,Alan Turing,Computer Science
9,10,Radioactive Substances,1904,9,9,Marie Curie,Chemistry


### Task15: Execute an SQL query to list of books written by Richard Feynman as a DataFrame

In [32]:
sql_query = """
SELECT name, title, publication_year FROM books b
INNER JOIN authors a
ON b.author_id = a.author_id
WHERE name = 'Richard Feynman'
"""
df_feynman = pd.read_sql(sql_query, engine)


2024-05-24 11:50:50,002 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 11:50:50,006 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-05-24 11:50:50,007 INFO sqlalchemy.engine.Engine [cached since 566.6s ago] {'table_name': "\nSELECT name, title, publication_year FROM books b\nINNER JOIN authors a\nON b.author_id = a.author_id\nWHERE name = 'Richard Feynman'\n", 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-05-24 11:50:50,018 INFO sqlalchemy.engine.Engine 
SELECT name, title, publication_year FROM boo

In [33]:
df_feynman

Unnamed: 0,name,title,publication_year
0,Richard Feynman,"Surely You are Joking, Mr. Feynman!",1985
1,Richard Feynman,Feynman Lectures on Computation,1996


### Task16:  Close the connection to the PostgreSQL database

In [35]:
pg_client_connect.close()

2024-05-24 11:51:45,849 INFO sqlalchemy.engine.Engine ROLLBACK


### Task17: Load the Northwind Database to PostgreSQL

https://tubcloud.tu-berlin.de/s/5rcomMN7p4QHE7x/download/northwind_postgre.sql

hint: psql -h hostname -d databasename -U username -f file.sql or with query

#### Meta commands

psql --host=localhost --username=postgres  --dbname=tuberlin

Meta commands are there to make live for a database administrator easier. Meta commands always start with a backslash (“\”) often followed by just one single character:

- \h (or \help)	displays all commands

- \h CREATE DATABASE	displays help on a specific command

- \l	list all databases

- \du	displays all users

- \c (or \connect) dbname	connect (or switch) to db called dbname

- \dt (just \d also works)	display all tables within selected db

- \i	insert records from dump

- \q	quit psql shell


In [36]:
from dotenv import load_dotenv
import os

load_dotenv()

USERNAME = os.getenv("USERNAME")
PASSWORD = os.getenv("PASSWORD")
HOST = os.getenv("HOST")
PORT = os.getenv("PORT")
DATABASE_NAME = os.getenv("DATABASE_NAME")

database_url = f'postgresql://{USERNAME}:{PASSWORD}@{HOST}:{PORT}/{DATABASE_NAME}'

engine = create_engine(url=database_url, echo=True)

In [37]:
sql_query = "SELECT * FROM region"

In [38]:
df = pd.read_sql(sql_query, engine)

2024-05-24 12:05:16,101 INFO sqlalchemy.engine.Engine select pg_catalog.version()
2024-05-24 12:05:16,102 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-05-24 12:05:16,110 INFO sqlalchemy.engine.Engine select current_schema()
2024-05-24 12:05:16,111 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-05-24 12:05:16,121 INFO sqlalchemy.engine.Engine show standard_conforming_strings
2024-05-24 12:05:16,122 INFO sqlalchemy.engine.Engine [raw sql] {}
2024-05-24 12:05:16,128 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 12:05:16,129 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname

In [39]:
df

Unnamed: 0,regionid,regiondescription
0,1,Eastern
1,2,Western
2,3,Northern
3,4,Southern


### Task18: Extract order information including order ID, customer contact name, and order date by performing an SQL query that joins the "Orders" table with the "Customers" table on the basis of customer IDs

In [42]:
sql_query = """

SELECT orders.orderid, customers.contactname, orders.orderdate FROM orders
INNER JOIN customers
ON customers.customerid = orders.customerid

"""

In [43]:
df = pd.read_sql(sql_query, engine)
df

2024-05-24 12:09:57,913 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-05-24 12:09:57,920 INFO sqlalchemy.engine.Engine SELECT pg_catalog.pg_class.relname 
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 
WHERE pg_catalog.pg_class.relname = %(table_name)s AND pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s, %(param_5)s]) AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s
2024-05-24 12:09:57,921 INFO sqlalchemy.engine.Engine [cached since 281.8s ago] {'table_name': '\n\nSELECT orders.orderid, customers.contactname, orders.orderdate FROM orders\nINNER JOIN customers\nON customers.customerid = orders.customerid\n\n', 'param_1': 'r', 'param_2': 'p', 'param_3': 'f', 'param_4': 'v', 'param_5': 'm', 'nspname_1': 'pg_catalog'}
2024-05-24 12:09:57,926 INFO sqlalchemy.engine.Engine 

SELECT orders.orderid, customers

Unnamed: 0,orderid,contactname,orderdate
0,10248,Paul Henriot,1996-07-04
1,10249,Karin Josephs,1996-07-05
2,10250,Mario Pontes,1996-07-08
3,10251,Mary Saveley,1996-07-08
4,10252,Pascale Cartrain,1996-07-09
...,...,...,...
825,11073,Guillermo Fernández,1998-05-05
826,11074,Jytte Petersen,1998-05-06
827,11075,Michael Holz,1998-05-06
828,11076,Laurence Lebihan,1998-05-06


### Task18: Install requests library if not installed, then import it.

In [44]:
!pip install requests



In [45]:
import requests

### Task19: Send a GET request to retrieve posts URL ("https://jsonplaceholder.typicode.com/posts").

You must check the response code 200 (<Response [200]>).

In [46]:
url = "https://jsonplaceholder.typicode.com/posts"

In [47]:
response = requests.get(url)

In [48]:
response

<Response [200]>

### Task20: If the response status code 200, convert response to JSON.

In [49]:
posts = response.json()

In [50]:
posts

[{'userId': 1,
  'id': 1,
  'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit',
  'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'},
 {'userId': 1,
  'id': 2,
  'title': 'qui est esse',
  'body': 'est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla'},
 {'userId': 1,
  'id': 3,
  'title': 'ea molestias quasi exercitationem repellat qui ipsa sit aut',
  'body': 'et iusto sed quo iure\nvoluptatem occaecati omnis eligendi aut ad\nvoluptatem doloribus vel accusantium quis pariatur\nmolestiae porro eius odio et labore et velit aut'},
 {'userId': 1,
  'id': 4,
  'title': 'eum et est occaecati',
  'body': 'ullam et saepe reiciendis voluptatem adipisci\nsit amet autem assumenda provid

### Task 21: Create a DataFrame from the JSON response and print the titles of the posts.

In [51]:
df = pd.DataFrame(posts)

In [52]:
df

Unnamed: 0,userId,id,title,body
0,1,1,sunt aut facere repellat provident occaecati e...,quia et suscipit\nsuscipit recusandae consequu...
1,1,2,qui est esse,est rerum tempore vitae\nsequi sint nihil repr...
2,1,3,ea molestias quasi exercitationem repellat qui...,et iusto sed quo iure\nvoluptatem occaecati om...
3,1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci\...
4,1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed\nalias aut...
...,...,...,...,...
95,10,96,quaerat velit veniam amet cupiditate aut numqu...,in non odio excepturi sint eum\nlabore volupta...
96,10,97,quas fugiat ut perspiciatis vero provident,eum non blanditiis soluta porro quibusdam volu...
97,10,98,laboriosam dolor voluptates,doloremque ex facilis sit sint culpa\nsoluta a...
98,10,99,temporibus sit alias delectus eligendi possimu...,quo deleniti praesentium dicta non quod\naut e...


### Bonus

In [61]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "http://quotes.toscrape.com/"
response = requests.get(url)

if response.status_code == 200:
    page_content = response.content
    soup = BeautifulSoup(page_content, 'html.parser')
    
    quotes = soup.find_all('div', class_='quote')
    
    data = []
    
    for quote in quotes:
        text = quote.find('span', class_='text').get_text()
        author = quote.find('small', class_='author').get_text()
        data.append({'Quote':text, 'Author':author})
        
    df = pd.DataFrame(data)
    print(df)
    
else:
    print("Failed...")

                                               Quote             Author
0  “The world as we have created it is a process ...    Albert Einstein
1  “It is our choices, Harry, that show what we t...       J.K. Rowling
2  “There are only two ways to live your life. On...    Albert Einstein
3  “The person, be it gentleman or lady, who has ...        Jane Austen
4  “Imperfection is beauty, madness is genius and...     Marilyn Monroe
5  “Try not to become a man of success. Rather be...    Albert Einstein
6  “It is better to be hated for what you are tha...         André Gide
7  “I have not failed. I've just found 10,000 way...   Thomas A. Edison
8  “A woman is like a tea bag; you never know how...  Eleanor Roosevelt
9  “A day without sunshine is like, you know, nig...       Steve Martin


In [62]:
df

Unnamed: 0,Quote,Author
0,“The world as we have created it is a process ...,Albert Einstein
1,"“It is our choices, Harry, that show what we t...",J.K. Rowling
2,“There are only two ways to live your life. On...,Albert Einstein
3,"“The person, be it gentleman or lady, who has ...",Jane Austen
4,"“Imperfection is beauty, madness is genius and...",Marilyn Monroe
5,“Try not to become a man of success. Rather be...,Albert Einstein
6,“It is better to be hated for what you are tha...,André Gide
7,"“I have not failed. I've just found 10,000 way...",Thomas A. Edison
8,“A woman is like a tea bag; you never know how...,Eleanor Roosevelt
9,"“A day without sunshine is like, you know, nig...",Steve Martin


In [None]:
import requests
import re
import pandas as pd

# URL of the website to scrape
url = "http://quotes.toscrape.com/"

# Send a GET request to the website
response = requests.get(url)
response_text = response.text

# Define regex patterns to extract quotes and authors
quote_pattern = re.compile(r'<span class="text" itemprop="text">“([^”]+)”</span>')
author_pattern = re.compile(r'<small class="author" itemprop="author">([^<]+)</small>')

# Find all quotes and authors using regex
quotes = quote_pattern.findall(response_text)
authors = author_pattern.findall(response_text)

# Combine quotes and authors into a DataFrame
data = {'Quote': quotes, 'Author': authors}
df = pd.DataFrame(data)

# Display the DataFrame
print(df)


#### Components: Regex Pattern for Quotes


-   `r''`: The `r` before the string indicates a raw string, which means that backslashes are treated literally and not as escape characters.
-   `<span class="text" itemprop="text">`: Matches the literal string `<span class="text" itemprop="text">`. This is the HTML tag where the quote text is located.
-   `"`: Matches the opening quotation mark for the quote (curly double quote).
-   `([^"]+)`: This is a capturing group:
    -   `[` and `]`: Defines a character class.
    -   `^"`: The caret `^` inside the character class negates it, meaning "any character except the closing curly double quote".
    -   `+`: Quantifier meaning "one or more" of the preceding character class.
-   `"`: Matches the closing quotation mark for the quote (curly double quote).
-   `</span>`: Matches the literal string `</span>`, which is the closing tag for the quote.

This pattern effectively captures the text between the curly double quotes inside the specified `<span>` tag.

#### Components: Regex Pattern for Authors


-   `r''`: The `r` before the string indicates a raw string, similar to the previous pattern.
-   `<small class="author" itemprop="author">`: Matches the literal string `<small class="author" itemprop="author">`. This is the HTML tag where the author's name is located.
-   `([^<]+)`: This is a capturing group:
    -   `[` and `]`: Defines a character class.
    -   `^<`: The caret `^` inside the character class negates it, meaning "any character except the opening angle bracket `<`".
    -   `+`: Quantifier meaning "one or more" of the preceding character class.
-   `</small>`: Matches the literal string `</small>`, which is the closing tag for the author's name.

This pattern captures the text between the `<small>` tags that contain the author's name.