# **INFO5731 Assignment 2**

In this assignment, you will work on gathering text data from an open data source via web scraping or API. Following this, you will need to clean the text data and perform syntactic analysis on the data. Follow the instructions carefully and design well-structured Python programs to address each question.

**Expectations**:
*   Use the provided .*ipynb* document to write your code & respond to the questions. Avoid generating a new file.
*   Write complete answers and run all the cells before submission.
*   Make sure the submission is "clean"; *i.e.*, no unnecessary code cells.
*   Once finished, allow shared rights from top right corner (*see Canvas for details*).

* **Make sure to submit the cleaned data CSV in the comment section - 10 points**

**Total points**: 100

**Deadline**: Monday, at 11:59 PM.

**Late Submission will have a penalty of 10% reduction for each day after the deadline.**

**Please check that the link you submitted can be opened and points to the correct assignment.**


# Question 1 (25 points)

Write a python program to collect text data from **either of the following sources** and save the data into a **csv file:**

(1) Collect all the customer reviews of a product (you can choose any porduct) on amazon. [atleast 1000 reviews]

(2) Collect the top 1000 User Reviews of a movie recently in 2023 or 2024 (you can choose any movie) from IMDB. [If one movie doesn't have sufficient reviews, collect reviews of atleast 2 or 3 movies]


(3) Collect the **abstracts** of the top 10000 research papers by using the query "machine learning", "data science", "artifical intelligence", or "information extraction" from Semantic Scholar.

(4) Collect all the information of the 904 narrators in the Densho Digital Repository.

(5)**Collect a total of 10000 reviews** of the top 100 most popular software from G2 and Capterra.


In [None]:
# Your code here
import requests
import csv
import time

def fetch_with_exponential_backoff(url, params, max_attempts=5):
    """Attempt an HTTP GET request with exponential backoff on rate limiting."""
    delay = 5  # initial delay in seconds
    for attempt in range(max_attempts):
        resp = requests.get(url, params=params)
        if resp.status_code == 429:
            print(f"429 received. Retrying in {delay} seconds (attempt {attempt + 1})...")
            time.sleep(delay)
            delay *= 2
        else:
            return resp
    return resp  # return the final response if still failing

def main():
    # Configuration parameters
    search_keyword = "machine learning"
    desired_fields = "paperId,title,abstract,year"
    records_per_page = 100
    total_target = 10000
    api_url = "https://api.semanticscholar.org/graph/v1/paper/search"
    year_span = range(2000, 2025)
    output_file = "semantic_scholar_abstracts.csv"

    collected_records = 0

    with open(output_file, mode='w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(["paperId", "title", "abstract", "year"])  # header row

        for yr in year_span:
            page_offset = 0
            # Modify query to include the publication year for finer results
            query_text = f"{search_keyword} {yr}"
            print(f"\nStarting data retrieval for year {yr}.")

            while collected_records < total_target:
                params = {
                    "query": query_text,
                    "offset": page_offset,
                    "limit": records_per_page,
                    "fields": desired_fields
                }
                print(f"Requesting records {page_offset + 1} to {page_offset + records_per_page} for {yr}...")
                response = fetch_with_exponential_backoff(api_url, params)

                if response.status_code == 400:
                    print(f"Bad request at offset {page_offset} for {yr}. Skipping to next year.")
                    break
                elif response.status_code != 200:
                    print(f"Unexpected status code {response.status_code}. Aborting retrieval.")
                    break

                results = response.json()
                papers = results.get("data", [])
                if not papers:
                    print(f"No additional papers found for {yr}.")
                    break

                # Write each paper's details to the CSV file
                for paper in papers:
                    writer.writerow([
                        paper.get("paperId", ""),
                        paper.get("title", ""),
                        paper.get("abstract", ""),
                        paper.get("year", "")
                    ])
                    collected_records += 1
                    if collected_records >= total_target:
                        break

                page_offset += records_per_page
                time.sleep(1)  # pause briefly to mitigate rate limits

            if collected_records >= total_target:
                break

    print(f"\nData collection complete. Total records saved: {collected_records} in '{output_file}'.")

if __name__ == "__main__":
    main()


Starting data retrieval for year 2000.
Requesting records 1 to 100 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
Requesting records 101 to 200 for 2000...
Requesting records 201 to 300 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
429 received. Retrying in 10 seconds (attempt 2)...
429 received. Retrying in 20 seconds (attempt 3)...
Requesting records 301 to 400 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
Requesting records 401 to 500 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
429 received. Retrying in 10 seconds (attempt 2)...
Requesting records 501 to 600 for 2000...
Requesting records 601 to 700 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
Requesting records 701 to 800 for 2000...
Requesting records 801 to 900 for 2000...
Requesting records 901 to 1000 for 2000...
Requesting records 1001 to 1100 for 2000...
429 received. Retrying in 5 seconds (attempt 1)...
429 received. Retrying in 10 second

# Question 2 (15 points)

Write a python program to **clean the text data** you collected in the previous question and save the clean data in a new column in the csv file. The data cleaning steps include: [Code and output is required for each part]

(1) Remove noise, such as special characters and punctuations.

(2) Remove numbers.

(3) Remove stopwords by using the stopwords list.

(4) Lowercase all texts

(5) Stemming.

(6) Lemmatization.

In [None]:
import csv
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Download necessary NLTK data files (if not already installed)
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

def remove_noise(text):
    """Remove special characters and punctuations."""
    return re.sub(r'[^\w\s]', '', text)

def remove_numbers(text):
    """Remove numbers from text."""
    return re.sub(r'\d+', '', text)

def remove_stopwords(text):
    """Remove English stopwords."""
    stop_words = set(stopwords.words('english'))
    words = text.split()
    filtered_words = [word for word in words if word.lower() not in stop_words]
    return ' '.join(filtered_words)

def to_lowercase(text):
    """Convert text to lowercase."""
    return text.lower()

def apply_stemming(text):
    """Apply Porter stemming."""
    stemmer = PorterStemmer()
    words = text.split()
    stemmed_words = [stemmer.stem(word) for word in words]
    return ' '.join(stemmed_words)

def apply_lemmatization(text):
    """Apply WordNet lemmatization."""
    lemmatizer = WordNetLemmatizer()
    words = text.split()
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
    return ' '.join(lemmatized_words)

def clean_text(text):
    print("\n--- Cleaning Process Start ---")
    print("Original Text:")
    print(text)

    step1 = remove_noise(text)
    print("\nAfter noise removal (special characters & punctuation removed):")
    print(step1)

    step2 = remove_numbers(step1)
    print("\nAfter number removal:")
    print(step2)

    step3 = remove_stopwords(step2)
    print("\nAfter stopwords removal:")
    print(step3)

    step4 = to_lowercase(step3)
    print("\nAfter converting to lowercase:")
    print(step4)

    step5 = apply_stemming(step4)
    print("\nAfter stemming:")
    print(step5)

    step6 = apply_lemmatization(step5)
    print("\nAfter lemmatization:")
    print(step6)
    print("--- Cleaning Process End ---\n")

    return step6

def process_csv(input_csv, output_csv):
    with open(input_csv, mode='r', encoding='utf-8', newline='') as infile, \
         open(output_csv, mode='w', encoding='utf-8', newline='') as outfile:
        reader = csv.DictReader(infile)
        fieldnames = reader.fieldnames + ["clean_abstract"]
        writer = csv.DictWriter(outfile, fieldnames=fieldnames)
        writer.writeheader()

        for row in reader:
            original_text = row.get("abstract", "")
            if original_text.strip():
                # Remove the conditional check to print for every row.
                cleaned_text = clean_text(original_text)
            else:
                cleaned_text = ""
            row["clean_abstract"] = cleaned_text
            writer.writerow(row)

if __name__ == '__main__':
    input_csv_file = "semantic_scholar_abstracts.csv"
    output_csv_file = "semantic_scholar_abstracts_clean.csv"
    process_csv(input_csv_file, output_csv_file)
    print(f"\nData cleaning complete. Clean data saved in '{output_csv_file}'.")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


[1;30;43mStreaming output truncated to the last 5000 lines.[0m


METHODS
A total of 158 patients were recruited in a retrospective cohort study for the assessment and comparison of facial symmetry before and after OGS from January 2018 to March 2020 Threedimensional facial photographs were captured by the 3dMD face system in a natural head position with eyes looking forward relaxed facial muscles and habitual dental occlusion before and at least 6 months after surgery Threedimensional contour images were extracted from 3D facial images for the subsequent Webbased automatic assessment of facial symmetry by using the transfer learning with a convolutional neural network model


RESULTS
The mean score of postoperative facial symmetry showed significant improvements from 274 to 352 and the improvement degree of facial symmetry in percentage after surgery was 21 using the constructed machine learning model A Webbased system provided a userfriendly interface and quick assessment results fo

# Question 3 (15 points)

Write a python program to **conduct syntax and structure analysis of the clean text** you just saved above. The syntax and structure analysis includes:

(1) **Parts of Speech (POS) Tagging:** Tag Parts of Speech of each word in the text, and calculate the total number of N(oun), V(erb), Adj(ective), Adv(erb), respectively.

(2) **Constituency Parsing and Dependency Parsing:** print out the constituency parsing trees and dependency parsing trees of all the sentences. Using one sentence as an example to explain your understanding about the constituency parsing tree and dependency parsing tree.

(3) **Named Entity Recognition:** Extract all the entities such as person names, organizations, locations, product names, and date from the clean texts, calculate the count of each entity.

In [None]:
# Your code here
import pandas as pd
import spacy
import nltk
from nltk import RegexpParser
from collections import Counter

# Download necessary NLTK data (only if not already present)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

# -------------------------------
# Load spaCy model for English
# -------------------------------
nlp = spacy.load("en_core_web_sm")

# -------------------------------
# Load cleaned text data from CSV
# -------------------------------
data_file = "semantic_scholar_abstracts_clean.csv"
df = pd.read_csv(data_file)
# the cleaned texts are in a column named 'clean_abstract'
clean_texts = df["clean_abstract"].dropna().tolist()

# (1) Parts of Speech (POS) Tagging
noun_total = 0
verb_total = 0
adj_total = 0
adv_total = 0

# Process each cleaned text
for text in clean_texts:
    doc = nlp(text)
    for token in doc:
        # Count both common and proper nouns
        if token.pos_ in ["NOUN", "PROPN"]:
            noun_total += 1
        elif token.pos_ == "VERB":
            verb_total += 1
        elif token.pos_ == "ADJ":
            adj_total += 1
        elif token.pos_ == "ADV":
            adv_total += 1

print("=== POS Tagging Counts ===")
print("Nouns:", noun_total)
print("Verbs:", verb_total)
print("Adjectives:", adj_total)
print("Adverbs:", adv_total)

# (2) Constituency Parsing and Dependency Parsing
if clean_texts:
    doc_example = nlp(clean_texts[0])
    # Extract the first sentence
    sentence = list(doc_example.sents)[0]
    print("\n=== Example Sentence for Parsing ===")
    print(sentence.text)

    # --- Constituency Parsing ---
    # We generate a shallow constituency parse tree using NLTK's RegexpParser.
    # First, extract tokens and their POS tags (using the fine-grained tag from spaCy).
    pos_tags = [(token.text, token.tag_) for token in sentence]
    print("\nPOS Tags (word, tag):")
    print(pos_tags)

    # Define a simple grammar for chunking (shallow constituency parsing)
    grammar = r"""
      NP: {<DT>?<JJ>*<NN.*>+}       # Noun Phrase
      PP: {<IN><NP>}               # Prepositional Phrase
      VP: {<VB.*><NP|PP>*}          # Verb Phrase
    """
    cp = RegexpParser(grammar)
    constituency_tree = cp.parse(pos_tags)
    print("\n--- Constituency Parse Tree (Shallow) ---")
    print(constituency_tree)

    # --- Dependency Parsing ---
    print("\n--- Dependency Parsing ---")
    for token in sentence:
        # For each token, print the dependency label and the head word.
        print(f"{token.text} ({token.dep_}) --> {token.head.text}")

    # --- Explanation ---
    print("\n--- Explanation ---")
    print("Constituency Parsing Tree: This tree groups words into constituents (such as NP for noun phrases and VP for verb phrases) "
          "using a simple grammar. Although shallow, it provides a hierarchical view of the sentence structure.")
    print("Dependency Parsing Tree: This shows the grammatical relationships between words. Each word (token) is linked to its 'head', "
          "revealing which words depend on others. This helps us understand the syntactic roles of the words within the sentence.")

# (3) Named Entity Recognition (NER)
# We extract and count entities such as PERSON, ORG, GPE, PRODUCT, and DATE.
entity_counts = Counter()
for text in clean_texts:
    doc = nlp(text)
    for ent in doc.ents:
        if ent.label_ in {"PERSON", "ORG", "GPE", "PRODUCT", "DATE"}:
            entity_counts[ent.label_] += 1

print("\n=== Named Entity Recognition Counts ===")
for entity, count in entity_counts.items():
    print(f"{entity}: {count}")


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


=== POS Tagging Counts ===
Nouns: 908088
Verbs: 133205
Adjectives: 117401
Adverbs: 13899

=== Example Sentence for Parsing ===
articl survey content workshop postprocess machin learn data mine interpret visual integr relat topic within kdd sixth acm sigkdd intern confer knowledg discoveri data mine boston usa august correspond web site wwwacmorgsigkddkdd first survey paper introduc state art workshop topic emphas postprocess form signific compon knowledg discoveri databas kdd next articl bring report content analysi discus aspect regard workshop afterward survey workshop paper found download wwwcasmcmastercabruhakddkddrephtml author report work organ workshop programm committe form addit three research field

POS Tags (word, tag):
[('articl', 'NNP'), ('survey', 'NN'), ('content', 'NN'), ('workshop', 'NNP'), ('postprocess', 'NN'), ('machin', 'NN'), ('learn', 'VBP'), ('data', 'NNS'), ('mine', 'PRP'), ('interpret', 'VBP'), ('visual', 'JJ'), ('integr', 'NNP'), ('relat', 'NNP'), ('topic', '

# **Following Questions must answer using AI assitance**

#Question 4 (20 points).

Q4. (PART-1)
Web scraping data from the GitHub Marketplace to gather details about popular actions. Using Python, the process begins by sending HTTP requests to multiple pages of the marketplace (1000 products), handling pagination through dynamic page numbers. The key details extracted include the product name, a short description, and the URL.

 The extracted data is stored in a structured CSV format with columns for product name, description, URL, and page number. A time delay is introduced between requests to avoid server overload. ChatGPT can assist by helping with the parsing of HTML, error handling, and generating reports based on the data collected.

 The goal is to complete the scraping within a specified time limit, ensuring that the process is efficient and adheres to GitHub’s usage guidelines.

(PART -2)

1.   **Preprocess Data**: Clean the text by tokenizing, removing stopwords, and converting to lowercase.

2. Perform **Data Quality** operations.


Preprocessing:
Preprocessing involves cleaning the text by removing noise such as special characters, HTML tags, and unnecessary whitespace. It also includes tasks like tokenization, stopword removal, and lemmatization to standardize the text for analysis.

Data Quality:
Data quality checks ensure completeness, consistency, and accuracy by verifying that all required columns are filled and formatted correctly. Additionally, it involves identifying and removing duplicates, handling missing values, and ensuring the data reflects the true content accurately.


Github MarketPlace page:
https://github.com/marketplace?type=actions

In [None]:
import requests
from bs4 import BeautifulSoup
import csv
import time
import random

def scrape_page(page_num: int) -> list:

    url = f"https://github.com/marketplace?type=actions&page={page_num}"
    headers = {
        "User-Agent": ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/115.0.0.0 Safari/537.36"),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9"
    }

    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    cards = soup.find_all("div", attrs={"data-testid": "marketplace-item"})
    results = []

    for card in cards:
        # Extract product name and URL
        name = "N/A"
        url_abs = "N/A"
        h3 = card.find("h3")
        if h3:
            link = h3.find("a", href=True)
            if link:
                name = link.get_text(strip=True)
                url_abs = "https://github.com" + link.get("href", "")

        # Extract description (if available)
        desc_tag = card.find("p", class_="mt-1 mb-0 text-small fgColor-muted line-clamp-2")
        desc = desc_tag.get_text(strip=True) if desc_tag else "N/A"

        results.append({
            "Product Name": name,
            "Description": desc,
            "URL": url_abs,
            "Page Number": page_num
        })
    return results

def run_scraper(max_pages: int = 500, output_file: str = "scraped_products.csv"):
    """
    Scrapes multiple pages and saves the product data to a CSV file.

    Args:
        max_pages (int): Number of pages to scrape.
        output_file (str): Output CSV filename.
    """
    all_products = []

    for p in range(1, max_pages + 1):
        products = scrape_page(p)
        if products:
            all_products.extend(products)
        time.sleep(random.uniform(1, 3))

    fieldnames = ["Product Name", "Description", "URL", "Page Number"]
    with open(output_file, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for prod in all_products:
            writer.writerow(prod)

    print(f"Scraping complete. Data saved to '{output_file}'.")

if __name__ == "__main__":
    run_scraper()

Scraping complete. Data saved to 'scraped_products.csv'.


### Part 2

In [None]:
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt_tab')
# Remove HTML tags, special characters, and digits, then lowercase the text
def basic_clean(text: str) -> str:
    # Remove HTML tags
    text = re.sub(r'<[^>]*>', '', text)
    # Remove non-letter characters
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    # Lowercase and strip whitespace
    return text.lower().strip()

# Initialize lemmatizer and stopwords
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words("english"))

# Preprocess text by cleaning, tokenizing, removing stopwords, and lemmatizing
def preprocess_text(text: str) -> str:
    # Clean the text
    cleaned = basic_clean(text)
    # Tokenize the cleaned text
    tokens = word_tokenize(cleaned)
    # Remove stopwords and lemmatize each token
    processed_tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words]
    # Return tokens joined back into a string
    return " ".join(processed_tokens)

# Clean the dataset by removing duplicates, dropping incomplete rows, and applying text preprocessing
def clean_dataset(input_csv: str = "scraped_products.csv", output_csv: str = "clean_scraped_product_data.csv"):
    # Load data from CSV
    df = pd.read_csv(input_csv)
    # Remove duplicate rows
    df.drop_duplicates(inplace=True)
    # Drop rows with missing values in critical columns
    df.dropna(subset=["Product Name", "Description"], inplace=True)
    # Reset index after cleaning
    df.reset_index(drop=True, inplace=True)

    # Apply preprocessing to 'Product Name' and add as a new column
    df["Product Name Processed"] = df["Product Name"].astype(str).apply(preprocess_text)
    # Apply preprocessing to 'Description' and add as a new column
    df["Description Processed"] = df["Description"].astype(str).apply(preprocess_text)

    # Save the cleaned DataFrame to a new CSV file
    df.to_csv(output_csv, index=False, encoding="utf-8")
    print(f"Cleaned data saved as: '{output_csv}'.")

if __name__ == "__main__":
    clean_dataset()

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Cleaned data saved as: 'clean_scraped_product_data.csv'.


#Question 5 (20 points)

PART 1:
Web Scrape  tweets from Twitter using the Tweepy API, specifically targeting hashtags related to subtopics (machine learning or artificial intelligence.)
The extracted data includes the tweet ID, username, and text.

Part 2:
Perform data cleaning procedures

A final data quality check ensures the completeness and consistency of the dataset. The cleaned data is then saved into a CSV file for further analysis.


**Note**

1.   Follow tutorials provided in canvas to obtain api keys. Use ChatGPT to get the code. Make sure the file is downloaded and saved.
2.   Make sure you divide GPT code as shown in tutorials, dont make multiple requestes.


In [None]:
# Part 1: Scrape Tweets using Tweepy API
!pip install tweepy
import tweepy
import pandas as pd
import re
import time

# Use the token
BEARER_TOKEN = "AAAAAAAAAAAAAAAAAAAAAOmpzQEAAAAAQlynM1SwvmCIAmqMf8M8tlZGWYY%3DwZbL0a36q82xvqh3FdaFihmeJnvHZf7DIUk4lwmM74OqJLi4vZ"

# Initialize the Tweepy client with the bearer token
client = tweepy.Client(bearer_token=BEARER_TOKEN)

# Build query to search for tweets with hashtags #machinelearning or #artificialintelligence
query = "#machinelearning OR #artificialintelligence -is:retweet lang:en"

# Make one request to get up to 100 recent tweets
response = client.search_recent_tweets(query=query,
                                       tweet_fields=["id", "text", "author_id"],
                                       expansions=["author_id"],
                                       user_fields=["username"],
                                       max_results=100)

# Build a mapping from author_id to username from the includes
user_lookup = {}
if response.includes and "users" in response.includes:
    for user in response.includes["users"]:
        user_lookup[user.id] = user.username

# Collect tweet data: tweet id, username, and text
tweets_data = []
if response.data:
    for tweet in response.data:
        username = user_lookup.get(tweet.author_id, "N/A")
        tweets_data.append({
            "tweet_id": tweet.id,
            "username": username,
            "text": tweet.text
        })

# Convert scraped data to a DataFrame and perform basic quality check
df = pd.DataFrame(tweets_data)
df.dropna(subset=["tweet_id", "username", "text"], inplace=True)
df.drop_duplicates(inplace=True)

# Part 2: Data Cleaning Procedures

# Function to remove URLs from text
def remove_urls(text):
    return re.sub(r'http\S+', '', text)

# Function to remove mentions from text
def remove_mentions(text):
    return re.sub(r'@\w+', '', text)

# Function to remove the hash symbol from hashtags (keeping the word)
def remove_hash_symbol(text):
    return re.sub(r'#', '', text)

# Function to remove punctuation and special characters
def remove_punctuation(text):
    return re.sub(r'[^\w\s]', '', text)

# Function to clean tweet text
def clean_text(text):
    text = remove_urls(text)
    text = remove_mentions(text)
    text = remove_hash_symbol(text)
    text = remove_punctuation(text)
    return text.lower().strip()

# Apply cleaning function to tweet text and create a new column
df["clean_text"] = df["text"].apply(clean_text)

# Final quality check: drop any rows that have become empty after cleaning
df = df[df["clean_text"].str.len() > 0]

# Save the cleaned data to a CSV file for further analysis
output_csv = "cleaned_tweets.csv"
df.to_csv(output_csv, index=False, encoding="utf-8")
print(f"Cleaned tweet data saved to '{output_csv}'.")


Cleaned tweet data saved to 'cleaned_tweets.csv'.


# Mandatory Question

Provide your thoughts on the assignment. What did you find challenging, and what aspects did you enjoy? Your opinion on the provided time to complete the assignment.

# Write your response below

The assignment helped me work on my scraping skills even though its complex and challenging but also enjoyed it as it pushed me to improve my scraping skills.


Fill out survey and provide your valuable feedback.

https://docs.google.com/forms/d/e/1FAIpQLSd_ObuA3iNoL7Az_C-2NOfHodfKCfDzHZtGRfIker6WyZqTtA/viewform?usp=dialog