
# 📂 Connecting to External Data Sources

This notebook provides **ready-to-use code templates** for connecting to various data sources, including APIs, PostgreSQL, and MongoDB.

### 🔹 When to Use Each Method:
- **APIs**: When retrieving data from external services (e.g., weather, financial data, news APIs).
- **PostgreSQL**: When working with structured, relational databases.
- **MongoDB**: When handling semi-structured or NoSQL data.


In [None]:

# Ensure required libraries are installed (Uncomment if necessary)
# !pip install pandas requests sqlalchemy psycopg2 pymongo



## 🌍 Connecting to an API

✅ Understand the **API authentication method** (e.g., API keys, OAuth).  
✅ Handle **rate limiting** (use caching if necessary).  
✅ Use **pagination** for large datasets.


In [None]:

import requests
import pandas as pd

# Define API endpoint and parameters
url = "https://api.example.com/data"
params = {"limit": 100, "format": "json"}

# Send GET request
response = requests.get(url, params=params)

# Convert response to DataFrame
if response.status_code == 200:
    data = response.json()  # Assuming JSON response
    df = pd.DataFrame(data)
    print(df.head())
else:
    print(f"Error: {response.status_code}, {response.text}")



## 🗄️ Connecting to a PostgreSQL Database

✅ Ensure the **database driver** is installed (`psycopg2` for PostgreSQL).  
✅ Use **environment variables** instead of hardcoding credentials.  
✅ **Index columns properly** to optimize queries.


In [None]:

from sqlalchemy import create_engine

# Define connection string (Replace with actual credentials)
DATABASE_URL = "postgresql://user:password@localhost:5432/mydatabase"

# Create connection engine
engine = create_engine(DATABASE_URL)

# Load data into a Pandas DataFrame
query = "SELECT * FROM my_table"
df = pd.read_sql(query, engine)

print(df.head())



## 🗂️ Connecting to MongoDB

✅ Use **connection pooling** for efficiency.  
✅ Ensure **indexes are properly set up** for queries.  
✅ Avoid **storing large text fields** inside MongoDB (consider storing file links instead).


In [None]:

from pymongo import MongoClient

# Connect to MongoDB (Replace credentials as needed)
client = MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
collection = db["mycollection"]

# Retrieve and convert data to DataFrame
data = list(collection.find({}, {"_id": 0}))  # Exclude MongoDB ObjectID
df = pd.DataFrame(data)

print(df.head())



## ✅ Best Practices & Common Pitfalls

- **Security**: Never expose API keys or database credentials in notebooks.  
- **Performance**: Use batching and limit queries for large datasets.  
- **Error Handling**: Always check for failed API/database connections.  
- **Data Cleaning**: Check for missing values before using imported data.  
