<a href="https://colab.research.google.com/github/dmorton714/code-You_DA_demos/blob/main/m2w1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup

# Local file import

In [4]:
local_file = pd.read_csv('data.csv')

In [5]:
local_file.head()

Unnamed: 0.1,Unnamed: 0,name,age,gender,race,income
0,0,Person0,38,Male,Asian,91235
1,1,Person1,54,Female,Black,84464
2,2,Person2,21,Male,Other,57546
3,3,Person3,57,Female,Hispanic,68550
4,4,Person4,24,Male,Other,18592


## Basic API Call

Data from: https://pipedream.com/@pravin/http-api-for-latest-wuhan-coronavirus-data-2019-ncov-p_G6CLVM/readme

### **The API returns:**

*   **Summary stats** (count of cases, recoveries and deaths)

    *   Global
    *   Mainland China
    *   Non-Mainland China

* **Raw data** (counts by region as published in the Google Sheet)

* **Metadata** (including when data was last published and the cache status)

- Line 1 defines the URL that will be used for the HTTP GET request.
- Line 2 sends an HTTP GET request to the specified URL using the get method from the requests library. The response is stored in the variable r.
- Line 3 outputs the r variable, representing the response from the server.

In [6]:
url = "https://coronavirus.m.pipedream.net/"
r = requests.get(url)
r

<Response [200]>

The line `json = r.json()` converts the JSON-formatted content of an HTTP response (r) into a Python object, making it easier to work with and extract information from the data received from the server.

In [7]:
json = r.json()

`json.keys()` is used to retrieve the keys from a Python dictionary (or a JSON object) stored in the variable json. In the context of working with `JSON` responses from a server, it is common to use this expression to inspect or access the keys of the received JSON data.

In [8]:
json.keys()

dict_keys(['summaryStats', 'cache', 'dataSource', 'apiSourceCode', 'rawData'])

`df = pd.DataFrame(json.get('rawData', []))` this line of code creates a DataFrame (df) using the data stored under the key 'rawData' in the json dictionary. If 'rawData' is not present, an empty DataFrame is created.

In [9]:
df = pd.DataFrame(json.get('rawData', []))
df.head()


Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
0,,,,Afghanistan,2023-03-10 04:21:03,33.93911,67.709953,209451,7896,,,Afghanistan,538.0424508714615,3.76985547932452
1,,,,Albania,2023-03-10 04:21:03,41.1533,20.1683,334457,3598,,,Albania,11621.96817012996,1.075773567304616
2,,,,Algeria,2023-03-10 04:21:03,28.0339,1.6596,271496,6881,,,Algeria,619.132365905185,2.534475646050034
3,,,,Andorra,2023-03-10 04:21:03,42.5063,1.5218,47890,165,,,Andorra,61981.49226687375,0.3445395698475673
4,,,,Angola,2023-03-10 04:21:03,-11.2027,17.8739,105288,1933,,,Angola,320.35277020195906,1.835916723653218


# Webscrape

## Steps:
### Define the URL:

1. Set the URL for the Wikipedia page related to Python programming.

### HTTP GET Request and HTML Parsing:

2. **Send HTTP GET Request:**
   - `requests.get(url).text`: Send an HTTP GET request to the URL and retrieve the HTML content.

3. **Parse HTML Content:**
   - `BeautifulSoup(..., 'html.parser')`: Parse the HTML content using BeautifulSoup.

### Print the Title:

4. **Print Title:**
   - `print("Title:", soup.title.text)`: Print the title of the Wikipedia page.

### Extract and Print the Second Paragraph:

5. **Find and Extract Second Paragraph:**
   - `soup.find_all('p')[1].text.strip()`: Find all paragraphs, select the second one, and extract its text.

6. **Print Second Paragraph:**
   - `print("Second Paragraph:", second_paragraph)`: Print the text of the second paragraph.


In [10]:
# Wikipedia page URL
url = "https://en.wikipedia.org/wiki/Python_(programming_language)"

# Send an HTTP GET request to the URL and parse the HTML content
soup = BeautifulSoup(requests.get(url).text, 'html.parser')

# Print the title
print("Title:", soup.title.text)

# Extract and print the text from the second paragraph
second_paragraph = soup.find_all('p')[1].text.strip()
print("Second Paragraph:", second_paragraph)

Title: Python (programming language) - Wikipedia
Second Paragraph: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.[31]
