# AIDM7330 Automated Online Data Acquisition Challenge
## Individual Assignment (5% of final marks)

Use generative AI is allowed (not required) to complete the following tasks.

**You can only use the Python syntax introduced in class so far. For scraping, only the Python Requests and BeautifulSoup packages are allowed.**

Report your prompts in the "Appendix" section at the bottom.

### Submission
Please submit:
- your work in Jupyter notebook format (.ipynb) **FULLY EXECUTED WITH OUTPUT VISIBLE**, and
- a CSV as the output
using the submission box in BU eLearning system.

- **Due date**: Oct 26, 23:59
- **Late submission**: You will lose 25% points each day.

## Your Name:

Liao Qinyi

## Your StudentID:

25477102

### Overview

The task is an automated online data acquisition challenge, i.e., to scrape some required information from webpages with Python Requests and BeautifulSoup packages in an automated manner, and store the information in a structured, machine-readable data format as the output.

### The task
You are requested to scrape a specific webpage from crypto.com.
You can find your URL on the list published on BU eLearning System, associated to your Student ID.

### Q1. (20 marks) . Quick warming up: please figure out the following information from the webpage:
1. Please print the “title” of this webpage, as stipulated by the “head” section. [10 marks]
2. Please print the text of all the paragraphs (`<p>` tag) of the webpage [10 marks].


In [18]:
from bs4 import BeautifulSoup
import requests

url = 'https://crypto.com/en/price?page=10'
page = requests.get(url)

pageStatus = page.status_code
print("http status code:", pageStatus)
print('-' * 200)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find('title').text.strip()
print("title:", title)
print('-' * 200)

paragraph = soup.find_all('p')
print("all paragraph text:")
for p in paragraph:
    paragraphText = p.get_text().strip()
    if paragraphText:
        print(paragraphText)
        print('-' * 200)


http status code: 200
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
title: Cryptocurrency Prices, Live Charts, Market Cap, News | Crypto.com
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
all paragraph text:
Buy BTC and 400+ crypto
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Lower fee trading
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Web3 crypto wallet
--------------------

### Q2. (80 marks) Now the scraping.
1.	For each cryptocurrency in the page, please scrape the following information: [60 marks]
- 	the full name, e.g., Bitcoin;
- 	the abbreviation, e.g., BTC;
-	the price;
-	the market cap.
2.	You are expected to use at least one of the control flow statements (i.e., conditional statements, loops, and function calls) in the code. [10 marks]
3.	You are expected to store the scraped information in a Pandas dataframe format and output the file in CSV format. [10 marks]




In [20]:
import pandas as pd
import csv

fullnameList = list()
abbreviationList = list()
priceList = list()
marketcapList = list()

# scrape full name data
fullname = soup.find_all('p', attrs = {'style':"color:var(--content-primary)"})
print("number of full name found:", len(fullname))
for nameRow in fullname:
    fullnameList.append(nameRow.get_text())
print("full name list:", fullnameList)
print('-' * 200)

# scrape abbreviation data
abbreviation = soup.find_all('p', attrs = {'style':"color:var(--content-tertiary);max-width:calc(6.25rem * var(--mantine-scale))"})
print("number of abbreviation found:", len(abbreviation))
for abbRow in abbreviation:
    abbreviationList.append(abbRow.get_text())
print("abbreviation list:", abbreviationList)
print('-' * 200)

# scrape price data
price = soup.find_all('p', attrs = {'style':"max-width:calc(8.125rem * var(--mantine-scale))"})
print("number of price found:", len(price))
for priceRow in price:
    priceList.append(priceRow.get_text())
print("price list:", priceList)
print('-' * 200)

# scrape marke tcap data
marketcap = soup.find_all('p', attrs={'data-variant':"body1"})
for i in range(0, len(marketcap), 2):
    marketcapList.append(marketcap[i].get_text())
print("number of market cap found:", len(marketcapList))
print("market cap list:", marketcapList)
print('-' * 200)

# create DataFrame
CryptoTable = pd.DataFrame({
  "Full Name": fullnameList,
  "Abbreviation": abbreviationList,
  "Price":priceList,
  "Market Cap":marketcapList,})
display(CryptoTable)

# Save to Google Drive
from google.colab import drive
drivePath = '/content/drive'
drive.mount(drivePath)
dataPath = drivePath + '/MyDrive/Colab Notebooks/assignment/'

import os, pathlib
if not(os.path.exists(dataPath)):
  path = pathlib.Path(dataPath)
  path.mkdir(parents=True, exist_ok=True)
  print('Path has been created')
else:
  print('The data path you selected already exists')

# save DataFrame as CSV file
  CryptoTable.to_csv(dataPath + 'CryptoTable.csv')


number of full name found: 100
full name list: ['SuperRare', 'Autonolas', 'MOMOFUN', 'Turtle (turtle.xyz)', 'Braintrust', 'Oasys', 'CorgiAI', 'Initia', 'Maverick Protocol', 'LUFFY', 'Manyu', 'Treehouse', 'Vulcan Forged PYR', 'PepeCoin Cryptocurrency', 'CargoX', 'MARBLEX', 'Roam', 'Openverse Network', 'Solv Protocol', 'Chain-key Bitcoin', 'MovieBloc', 'Symbol', 'Arena-Z', 'IX Swap', 'Epic Chain', 'Electronic USD', 'MAI', 'XION', 'Multiplier', 'Polyhedra Network', 'Yooldo', 'Stader MaticX', 'Just a chill guy', 'Phoenix Global [old]', 'Everscale', 'Radicle', 'Hamster Kombat', 'Aquarius', 'Wrapped Sei', 'NAVX Token', 'Tensor', 'Access Protocol', 'Eco', 'Travala.com', 'Astherus Staked USDF', 'Metadium', 'Pepe Cash', 'Minswap', 'Ski Mask Dog', 'OpenServ', 'Syscoin', 'Automata Network', 'Wrapped FRAX', 'Firo', 'Solend', 'Compound Uni', 'Isiklar Coin', 'Chainbase', 'Wirex Token', 'Xphere', 'PinLink', 'Boost', 'MOBOX', 'GoPlus Security', 'SPDR S&P 500 Tokenized ETF (Ondo)', 'noice', 'Hegic', 'T

Unnamed: 0,Full Name,Abbreviation,Price,Market Cap
0,SuperRare,RARE,$0.03585,$29.56M
1,Autonolas,OLAS,$0.1637,$29.53M
2,MOMOFUN,MM,$0.003789,$29.56M
3,Turtle (turtle.xyz),TURTLE,$0.1907,$29.5M
4,Braintrust,BTRST,$0.1224,$29.54M
...,...,...,...,...
95,VeraOne,VRO,$81.113,$23.34M
96,Yala,YALA,$0.09478,$23.35M
97,Quai Network,QUAI,$0.02968,$23.23M
98,Mamo,MAMO,$0.05544,$23.52M


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
The data path you selected already exists


# Declaration (mandatory, please complete the following)

I did  knowingly use generative AI tools in this assignment task

## Acknowledgment
If you did use any generative AI tools, please complete the following acknowledgment:

* In this assignment, I followed the University’s guidelines for students on academic integrity. No content generated by generative AI tools has been presented as my own work. I take responsibility for the work submitted.

* **Process**: In my assignment preparation, I acknowledge the use of generative AI tools to help explain Python syntax and correct my answer for help me understand unfamilar syntax and find the most appropriate solution.

* **Record**: I have kept a record of my use of AI tools, including the specific tool(s) used, the prompts submitted, and responses generated.

I understand that my teachers may ask me to provide this information.

# Appendix (mandatory)
Report below your prompts with a brief description of their usage.

- **Prompt**: "Use attrs method, fullname = soup.find('fullname', attrs={}), url is https://crypto.com/en/price?page=10"  
  **Usage**: Explored how to use the `attrs={}` parameter in BeautifulSoup's `find()` method and clarified the correct use of HTML tags and attributes.

- **Prompt**: "How to use select"  
  **Usage**: Asked for comprehensive explanation of BeautifulSoup's `select()` method and CSS selector implementation.

- **Prompt**: "The result is like this, help me filter to show only full names, no prices or abbreviations"  
  **Usage**: Requested data filtering techniques to isolate cryptocurrency full names from mixed output containing prices and abbreviations.

- **Prompt**: "How can I extract only the market cap and not the 24h volume from this HTML?"  
  **Usage**: Asked how to extract only the market cap values from complex HTML structure.

- **Prompt**: "What's the difference between find_all and find"  
  **Usage**: Asked to clarify the technical differences between BeautifulSoup's `find_all()` and `find()` methods for element selection.

- **Prompt**: "What the hell is append"  
  **Usage**: Requested basic explanation of Python's `append()` method and its role in list operations.

- **Prompt**: "What does strip=True mean?"  
  **Usage**: Asked for explanation of the `strip=True` argument in the `get_text()` method for text cleaning.

- **Prompt**: "Besides enumerate, are there more beginner-friendly syntaxes for finding odd rows"  
  **Usage**: Seeking alternative programming approaches for list filtering that are easier for beginners to understand.

- **Prompt**: "Explain for i in range(0, len(marketcap), 2)"  
  **Usage**: Request for detailed explanation of range stepping logic used in data filtering.

- **Prompt**: "Why is there no output when running"  
  **Usage**: Initial troubleshooting of web scraping code that returned no results due to HTML parsing issues.

- **Prompt**: "What's wrong" - with provided code snippet  
  **Usage**: Identification of variable naming inconsistencies and syntax errors in data processing code.