<a href="https://colab.research.google.com/github/Sachikethan/Sachikethan_INFO5731_Fall2024/blob/main/Guntha_Sachikethan_Assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **INFO5731 Assignment 2**

In this assignment, you will work on gathering text data from an open data source via web scraping or API. Following this, you will need to clean the text data and perform syntactic analysis on the data. Follow the instructions carefully and design well-structured Python programs to address each question.

**Expectations**:
*   Use the provided .*ipynb* document to write your code & respond to the questions. Avoid generating a new file.
*   Write complete answers and run all the cells before submission.
*   Make sure the submission is "clean"; *i.e.*, no unnecessary code cells.
*   Once finished, allow shared rights from top right corner (*see Canvas for details*).

* **Make sure to submit the cleaned data CSV in the comment section - 10 points**

**Total points**: 100

**Deadline**: Monday, at 11:59 PM.

**Late Submission will have a penalty of 10% reduction for each day after the deadline.**

**Please check that the link you submitted can be opened and points to the correct assignment.**


# Question 1 (25 points)

Write a python program to collect text data from **either of the following sources** and save the data into a **csv file:**

(1) Collect all the customer reviews of a product (you can choose any porduct) on amazon. [atleast 1000 reviews]

(2) Collect the top 1000 User Reviews of a movie recently in 2023 or 2024 (you can choose any movie) from IMDB. [If one movie doesn't have sufficient reviews, collect reviews of atleast 2 or 3 movies]


(3) Collect the **abstracts** of the top 10000 research papers by using the query "machine learning", "data science", "artifical intelligence", or "information extraction" from Semantic Scholar.

(4) Collect all the information of the 904 narrators in the Densho Digital Repository.

(5)**Collect a total of 10000 reviews** of the top 100 most popular software from G2 and Capterra.


In [None]:
import requests
import csv
import time

API_ENDPOINT = "https://api.semanticscholar.org/graph/v1/paper/search"
SEARCH_TERM = "machine learning"
RESULT_FIELDS = "paperId,title,abstract,year"
PAGE_SIZE = 100
TARGET_RECORDS = 10000
YEARS_TO_SEARCH = range(2000, 2025)
OUTPUT_CSV = "semantic_scholar_abstracts.csv"

def fetch_data(url, params, max_attempts=5):
    """Attempt to fetch data from the API with exponential backoff for rate limiting."""
    delay = 5
    for attempt in range(max_attempts):
        response = requests.get(url, params=params)
        if response.status_code == 429:
            print(f"Rate limit encountered. Waiting {delay} seconds before retrying...")
            time.sleep(delay)
            delay *= 2
        else:
            return response
    return response  # Return the last response if still failing

def main():
    total_records = 0

    with open(OUTPUT_CSV, mode='w', newline='', encoding='utf-8') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(["paperId", "title", "abstract", "year"])

        for year in YEARS_TO_SEARCH:
            offset = 0
            query = f"{SEARCH_TERM} {year}"
            print(f"\nStarting collection for year {year}...")

            while True:
                if total_records >= TARGET_RECORDS:
                    break

                params = {
                    "query": query,
                    "offset": offset,
                    "limit": PAGE_SIZE,
                    "fields": RESULT_FIELDS
                }

                print(f"Requesting records {offset+1} to {offset+PAGE_SIZE} for {year}...")
                response = fetch_data(API_ENDPOINT, params)

                if response.status_code == 400:
                    print(f"Bad request for offset {offset} in year {year}. Skipping this year.")
                    break
                if response.status_code != 200:
                    print(f"Unexpected error {response.status_code}. Halting the process.")
                    break

                results = response.json().get("data", [])
                if not results:
                    print(f"No more records found for year {year} at offset {offset}.")
                    break

                for paper in results:
                    csv_writer.writerow([
                        paper.get("paperId", ""),
                        paper.get("title", ""),
                        paper.get("abstract", ""),
                        paper.get("year", "")
                    ])
                    total_records += 1
                    if total_records >= TARGET_RECORDS:
                        break

                offset += PAGE_SIZE
                time.sleep(1)

            if total_records >= TARGET_RECORDS:
                break

    print(f"\nCollection complete. {total_records} records saved to '{OUTPUT_CSV}'.")

if __name__ == "__main__":
    main()


Starting collection for year 2000...
Requesting records 1 to 100 for 2000...
Requesting records 101 to 200 for 2000...
Rate limit encountered. Waiting 5 seconds before retrying...
Requesting records 201 to 300 for 2000...
Requesting records 301 to 400 for 2000...
Requesting records 401 to 500 for 2000...
Requesting records 501 to 600 for 2000...
Rate limit encountered. Waiting 5 seconds before retrying...
Rate limit encountered. Waiting 10 seconds before retrying...
Rate limit encountered. Waiting 20 seconds before retrying...
Requesting records 601 to 700 for 2000...
Rate limit encountered. Waiting 5 seconds before retrying...
Rate limit encountered. Waiting 10 seconds before retrying...
Rate limit encountered. Waiting 20 seconds before retrying...
Rate limit encountered. Waiting 40 seconds before retrying...
Rate limit encountered. Waiting 80 seconds before retrying...
Unexpected error 429. Halting the process.

Starting collection for year 2001...
Requesting records 1 to 100 for 20

# Question 2 (15 points)

Write a python program to **clean the text data** you collected in the previous question and save the clean data in a new column in the csv file. The data cleaning steps include: [Code and output is required for each part]

(1) Remove noise, such as special characters and punctuations.

(2) Remove numbers.

(3) Remove stopwords by using the stopwords list.

(4) Lowercase all texts

(5) Stemming.

(6) Lemmatization.

In [None]:
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

# Download required NLTK resources (if not already present)
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Define the input and output file names
input_csv = "semantic_scholar_abstracts.csv"
output_csv = "semantic_scholar_abstracts_cleaned.csv"

# Read the CSV file containing the abstracts
df = pd.read_csv(input_csv)

# Define a cleaning function that performs each required step and prints output at each stage
def clean_text(text):

    # 1. Remove noise: special characters and punctuations.
    # Replace any character that is not a letter, digit, or whitespace.
    no_punct = re.sub(r'[^\w\s]', '', text)
    print("\n----- After Removing Special Characters/Punctuation -----")
    print(no_punct)

    # 2. Remove numbers.
    no_numbers = re.sub(r'\d+', '', no_punct)
    print("\n----- After Removing Numbers -----")
    print(no_numbers)

    # 3. Remove stopwords.
    # Tokenize by splitting on whitespace and filter out common English stopwords.
    tokens = no_numbers.split()
    stop_words = set(stopwords.words('english'))
    tokens_no_stop = [word for word in tokens if word.lower() not in stop_words]
    no_stop_text = " ".join(tokens_no_stop)
    print("\n----- After Removing Stopwords -----")
    print(no_stop_text)

    # 4. Convert all text to lowercase.
    lower_text = no_stop_text.lower()
    print("\n----- After Converting to Lowercase -----")
    print(lower_text)

    # 5. Stemming using the PorterStemmer.
    stemmer = PorterStemmer()
    stemmed_tokens = [stemmer.stem(word) for word in lower_text.split()]
    stemmed_text = " ".join(stemmed_tokens)
    print("\n----- After Stemming -----")
    print(stemmed_text)

    # 6. Lemmatization using the WordNetLemmatizer.
    lemmatizer = WordNetLemmatizer()
    lemmatized_tokens = [lemmatizer.lemmatize(word) for word in lower_text.split()]
    lemmatized_text = " ".join(lemmatized_tokens)
    print("\n----- After Lemmatization -----")
    print(lemmatized_text)

    # Return the final cleaned text (here, we choose to return the lemmatized text)
    return lemmatized_text

# Apply the cleaning function to each abstract and store the result in a new column 'clean_abstract'
df['clean_abstract'] = df['abstract'].apply(lambda x: clean_text(str(x)))

# Drop rows where the 'clean_abstract' column is NaN
df.dropna(subset=['clean_abstract'], inplace=True)

# Save the updated DataFrame with the new clean column to a new CSV file
df.to_csv(output_csv, index=False)

print(f"\nCleaned data has been saved to '{output_csv}'.")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
background cardiovascular diseas cvd current lead caus prematur death worldwid modelbas earli detect highrisk popul cvd key cvd prevent thu research aim use machin learn ml algorithm establish cvd predict model base routin physic examin indic suitabl xinjiang rural popul method research cohort data collect divid two stage first stage involv baselin survey followup end decemb secondphas baselin survey conduct septemb decemb followup end august total particip uyghur kazak includ studi screen predictor establish variabl subset base least absolut shrinkag select oper lasso regress logist regress forward partial likelihood estim flr random forest rf featur import rf variabl import select subset variabl compar l regular logist regress llr rf support vector machin svm adaboost algorithm establish cvd predict model suitabl popul incid cvd popul analyz result year followup total peopl diagnos cvd cumul incid comparison discrimin c

# Question 3 (15 points)

Write a python program to **conduct syntax and structure analysis of the clean text** you just saved above. The syntax and structure analysis includes:

(1) **Parts of Speech (POS) Tagging:** Tag Parts of Speech of each word in the text, and calculate the total number of N(oun), V(erb), Adj(ective), Adv(erb), respectively.

(2) **Constituency Parsing and Dependency Parsing:** print out the constituency parsing trees and dependency parsing trees of all the sentences. Using one sentence as an example to explain your understanding about the constituency parsing tree and dependency parsing tree.

(3) **Named Entity Recognition:** Extract all the entities such as person names, organizations, locations, product names, and date from the clean texts, calculate the count of each entity.

In [None]:
# Your code here
import spacy
import pandas as pd
from collections import Counter
import nltk
from nltk import RegexpParser
# Download necessary NLTK resources
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Load the cleaned CSV file
df = pd.read_csv("semantic_scholar_abstracts_cleaned.csv")

# (1) POS Tagging & Counting
noun_count = 0
verb_count = 0
adj_count = 0
adv_count = 0

print("----- POS Tagging Totals -----")
for index, row in df.iterrows():
    text = str(row["clean_abstract"])
    doc = nlp(text)
    for token in doc:
        if token.pos_ == "NOUN":
            noun_count += 1
        elif token.pos_ == "VERB":
            verb_count += 1
        elif token.pos_ == "ADJ":
            adj_count += 1
        elif token.pos_ == "ADV":
            adv_count += 1

print("Total Nouns:", noun_count)
print("Total Verbs:", verb_count)
print("Total Adjectives:", adj_count)
print("Total Adverbs:", adv_count)

# (2) Constituency Parsing and Dependency Parsing
if not df.empty:
    sample_text = str(df["clean_abstract"].iloc[0])
    doc_sample = nlp(sample_text)
    sentences = list(doc_sample.sents)

    if sentences:
        example_sentence = sentences[0]
        print("\n----- Example Sentence for Parsing -----")
        print("Sentence:", example_sentence.text)

        # Dependency Parsing using spaCy:
        print("\nDependency Parse Tree:")
        def print_dependency(token, level=0):
            print("  " * level + f"{token.text} [{token.dep_}]")
            for child in token.children:
                print_dependency(child, level + 1)

        # Identify the root token in the sentence.
        root = [token for token in example_sentence if token.head == token][0]
        print_dependency(root)

        # Constituency Parsing using NLTK's RegexpParser:
        print("\nConstituency Parse Tree (using RegexpParser):")
        # Tokenize and POS tag the sentence with NLTK.
        tokens = nltk.word_tokenize(example_sentence.text)
        pos_tags = nltk.pos_tag(tokens)

        # Define a simple grammar for chunking.
        grammar = r"""
          NP: {<DT>?<JJ>*<NN.*>+}   # Noun Phrase
          VP: {<VB.*><NP|PP>*}       # Verb Phrase
          PP: {<IN><NP>}            # Prepositional Phrase
        """
        cp = RegexpParser(grammar)
        tree = cp.parse(pos_tags)
        tree.pretty_print()

        # Explanation:
        print("\nExplanation:")
        print("• The constituency parse tree above breaks the sentence into phrases, such as NP (noun phrase),")
        print("  showing the hierarchical structure of the sentence.")
        print("• The dependency parse tree shows how individual words relate to one another, with each word connected to its head word by a dependency label.")
    else:
        print("No sentences found in the sample text")
else:
    print("The CSV file is empty.")


# (3) Named Entity Recognition (NER) Counts
# Count entities for PERSON, ORG, GPE, PRODUCT, and DATE.
entity_counts = Counter()

for index, row in df.iterrows():
    text = str(row["clean_abstract"])
    doc = nlp(text)
    for ent in doc.ents:
        if ent.label_ in ["PERSON", "ORG", "GPE", "PRODUCT", "DATE"]:
            entity_counts[ent.label_] += 1

print("\n----- Named Entity Recognition Counts -----")
for entity, count in entity_counts.items():
    print(f"{entity}: {count}")



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


----- POS Tagging Totals -----
Total Nouns: 615006
Total Verbs: 231457
Total Adjectives: 212798
Total Adverbs: 46202

----- Example Sentence for Parsing -----
Sentence: nan

Dependency Parse Tree:
nan [ROOT]

Constituency Parse Tree (using RegexpParser):
  S   
  |    
  NP  
  |    
nan/NN


Explanation:
• The constituency parse tree above breaks the sentence into phrases, such as NP (noun phrase),
  showing the hierarchical structure of the sentence.
• The dependency parse tree shows how individual words relate to one another, with each word connected to its head word by a dependency label.

----- Named Entity Recognition Counts -----
PERSON: 8347
ORG: 8560
DATE: 5767
GPE: 3163
PRODUCT: 153


# **Following Questions must answer using AI assitance**

#Question 4 (20 points).

Q4. (PART-1)
Web scraping data from the GitHub Marketplace to gather details about popular actions. Using Python, the process begins by sending HTTP requests to multiple pages of the marketplace (1000 products), handling pagination through dynamic page numbers. The key details extracted include the product name, a short description, and the URL.

 The extracted data is stored in a structured CSV format with columns for product name, description, URL, and page number. A time delay is introduced between requests to avoid server overload. ChatGPT can assist by helping with the parsing of HTML, error handling, and generating reports based on the data collected.

 The goal is to complete the scraping within a specified time limit, ensuring that the process is efficient and adheres to GitHub’s usage guidelines.

(PART -2)

1.   **Preprocess Data**: Clean the text by tokenizing, removing stopwords, and converting to lowercase.

2. Perform **Data Quality** operations.


Preprocessing:
Preprocessing involves cleaning the text by removing noise such as special characters, HTML tags, and unnecessary whitespace. It also includes tasks like tokenization, stopword removal, and lemmatization to standardize the text for analysis.

Data Quality:
Data quality checks ensure completeness, consistency, and accuracy by verifying that all required columns are filled and formatted correctly. Additionally, it involves identifying and removing duplicates, handling missing values, and ensuring the data reflects the true content accurately.


Github MarketPlace page:
https://github.com/marketplace?type=actions

In [None]:
# ----- Part 1: Web Scraping Code -----
import requests
from bs4 import BeautifulSoup
import time
import random
import csv

def scrape_page(page_number):
    # Construct URL for the current page
    url = f"https://github.com/marketplace?type=actions&page={page_number}"
    # Set headers to mimic a real browser
    headers = {
        "User-Agent": ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                       "AppleWebKit/537.36 (KHTML, like Gecko) "
                       "Chrome/115.0.0.0 Safari/537.36"),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Referer": "https://github.com/"
    }
    try:
        # Send GET request to the URL with a timeout
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except Exception as e:
        # Print error message if fetching fails
        print(f"Error fetching page {page_number}: {e}")
        return []

    # Parse the HTML content with BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")
    # Find product cards using the data-testid attribute
    items = soup.find_all("div", attrs={"data-testid": "marketplace-item"})

    products = []
    for item in items:
        # Extract product name and URL from the <h3> element
        h3 = item.find("h3")
        if h3:
            link = h3.find("a", href=True)
            if link:
                name = link.get_text(strip=True)
                product_url = "https://github.com" + link["href"]
            else:
                name, product_url = None, None
        else:
            name, product_url = None, None

        # Extract the product description from the <p> tag
        p_tag = item.find("p", class_="mt-1 mb-0 text-small fgColor-muted line-clamp-2")
        description = p_tag.get_text(strip=True) if p_tag else ""

        if name and product_url:
            products.append({
                "Name": name,
                "Description": description,
                "URL": product_url,
                "Page": page_number
            })
    return products

def scrape_marketplace(max_pages=500, output_csv="github_actions_data.csv"):
    all_products = []
    for page in range(1, max_pages + 1):
        # Scrape the current page and extend the products list
        products = scrape_page(page)
        if products:
            all_products.extend(products)
        time.sleep(random.uniform(1, 5))

    try:
        # Write all scraped products to a CSV file
        with open(output_csv, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=["Name", "Description", "URL", "Page"])
            writer.writeheader()
            writer.writerows(all_products)
    except Exception as err:
        print("Error writing CSV:", err)

# ----- Part 2: Data Preprocessing Code -----
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download required NLTK data
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)

def remove_noise(text):
    # Remove HTML tags and non-alphabet characters, then convert text to lowercase
    text = re.sub(r'<[^>]+>', '', text)
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text.lower().strip()

lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words("english"))

def process_text(text):
    # Clean text, tokenize, remove stopwords, and lemmatize tokens
    cleaned_text = remove_noise(text)
    tokens = word_tokenize(cleaned_text)
    filtered_tokens = [lemmatizer.lemmatize(token) for token in tokens if token not in stop_words]
    return " ".join(filtered_tokens)

def preprocess_dataset(input_csv="github_actions_data.csv", output_csv="processed_actions_data.csv"):
    df = pd.read_csv(input_csv)
    # Remove duplicates and drop rows with missing critical values
    df.drop_duplicates(inplace=True)
    df.dropna(subset=["Name", "Description"], inplace=True)
    df.reset_index(drop=True, inplace=True)

    # Process text in relevant columns
    df["Name_Clean"] = df["Name"].astype(str).apply(process_text)
    df["Description_Clean"] = df["Description"].astype(str).apply(process_text)

    try:
        # Save the processed data to a new CSV file
        df.to_csv(output_csv, index=False, encoding="utf-8")
    except Exception as e:
        print("Error saving processed CSV:", e)

# ----- Main Execution -----
def main():
    # Run scraping (Part 1) and then preprocessing (Part 2)
    scrape_marketplace()
    preprocess_dataset()

if __name__ == "__main__":
    main()

#Question 5 (20 points)

PART 1:
Web Scrape  tweets from Twitter using the Tweepy API, specifically targeting hashtags related to subtopics (machine learning or artificial intelligence.)
The extracted data includes the tweet ID, username, and text.

Part 2:
Perform data cleaning procedures

A final data quality check ensures the completeness and consistency of the dataset. The cleaned data is then saved into a CSV file for further analysis.


**Note**

1.   Follow tutorials provided in canvas to obtain api keys. Use ChatGPT to get the code. Make sure the file is downloaded and saved.
2.   Make sure you divide GPT code as shown in tutorials, dont make multiple requestes.


In [None]:
!pip install tweepy
import tweepy
import pandas as pd
import re

# PART 1: Twitter API setup and data extraction

# Replace with your provided bearer token
bearer_token = "AAAAAAAAAAAAAAAAAAAAAOmpzQEAAAAAQoGOTElxPYsgqUPZtj9s%2Fwuayh4%3DjwLMvB6i08QKfQACJj92wiBqKxVCO2LmQ1qbVswuqLGq3f6Djk"

# Initialize the Tweepy client using the bearer token
client = tweepy.Client(bearer_token=bearer_token)

# Define the query to search for tweets with either hashtag (excluding retweets and filtering for English language)
query = "#machinelearning OR #artificialintelligence -is:retweet lang:en"

# Search for recent tweets (maximum 100 results in one request)
response = client.search_recent_tweets(
    query=query,
    tweet_fields=['id', 'text', 'author_id'],
    expansions=['author_id'],
    user_fields=['username'],
    max_results=100
)

# Extract tweets and their corresponding user information
tweets_data = []
if response.data is not None and 'users' in response.includes:
    # Create a mapping from user ID to user details for easy lookup
    users = {u.id: u for u in response.includes['users']}
    for tweet in response.data:
        # Get the username using the author_id
        author = users.get(tweet.author_id)
        tweets_data.append({
            "tweet_id": tweet.id,
            "username": author.username if author else None,
            "text": tweet.text
        })

# PART 2: Data cleaning and saving to CSV

# Function to clean tweet text
def clean_tweet(text):
    # Remove URLs
    text = re.sub(r'http\S+', '', text)
    # Remove user mentions
    text = re.sub(r'@\w+', '', text)
    # Remove the hash symbol (keeping the hashtag word if desired)
    text = re.sub(r'#', '', text)
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    return text

# Clean the text of each tweet and add a new key 'clean_text'
for tweet in tweets_data:
    tweet["clean_text"] = clean_tweet(tweet["text"])

# Final data quality check: Remove any tweets that result in an empty clean text
tweets_data = [tweet for tweet in tweets_data if tweet["clean_text"]]

# Convert the list of tweet dictionaries to a DataFrame
df = pd.DataFrame(tweets_data)

# Save the cleaned data into a CSV file
df.to_csv("cleaned_tweets.csv", index=False)
print("Cleaned tweets saved to cleaned_tweets.csv")

Cleaned tweets saved to cleaned_tweets.csv


# Mandatory Question

Provide your thoughts on the assignment. What did you find challenging, and what aspects did you enjoy? Your opinion on the provided time to complete the assignment.

Overall, it was a tidous process. I took a lot of time to finish the first question. I used different methods to finsih amazon reviews but could'nt finish scraping data. I tried movie reviews but was not able to scrape the complete page. Using the modules and use learnings in the canvas I was able to extract abstracts. I learned and explored some of the methods for web scraping. BeutifulSoup and Selenium. This was intersting and fun. I has a lot of time but was not able to do complex code.

# Write your response below
Fill out survey and provide your valuable feedback.

https://docs.google.com/forms/d/e/1FAIpQLSd_ObuA3iNoL7Az_C-2NOfHodfKCfDzHZtGRfIker6WyZqTtA/viewform?usp=dialog