# Data Collection

Data collection involves gathering data from various sources, transforming it into a useful format, and storing it for further processing and analysis.

## Web Scraping

Web scraping is the process of extracting data from websites. In Python, you can use the `beautifulsoup4` and `requests` modules to scrape data from HTML pages. Here's an example of how to scrape a list of articles from a news website:

In [1]:
# !pip3 install requests
# !pip3 install beautifulsoup4

In [2]:
import requests
from bs4 import BeautifulSoup

url = 'https://www.example.com/news'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

articles = []
for article in soup.find_all('article'):
    title = article.find('h2').text
    link = article.find('a')['href']
    summary = article.find('p').text
    articles.append({'title': title, 'link': link, 'summary': summary})

print(articles)

[]


## APIs

APIs are interfaces that allow you to access data from other applications or websites. In Python, you can use the requests module to make API requests and retrieve data in JSON format. Here's an example of how to retrieve data from the OpenWeatherMap API:

In [3]:
import requests

url = 'https://api.openweathermap.org/data/2.5/weather?q=London,uk&appid=your_api_key'
response = requests.get(url)

data = response.json()
print(data)

{'cod': 401, 'message': 'Invalid API key. Please see https://openweathermap.org/faq#error401 for more info.'}
