**Overview**

This code demonstrates a simplified Extract, Transform, Load (ETL) process using a public API. It fetches data from the Open Library API, inspects its structure, and then attempts to convert it into a table-like format for analysis using the pandas library.

**Imports**

In [None]:
import requests
import pandas as pd
import json

*   requests: This library is used to make HTTP requests, allowing the code to fetch data from the API.ist item
*   pandas: This library provides powerful data structures like
*   json: This library is used to work with JSON data, which is a common format for data exchange on the web.





**1. Select API and Fetch Data**

In [None]:
api_url = "https://openlibrary.org/search.json?q=the+lord+of+the+rings"

try:
    response = requests.get(api_url)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.RequestException as e:
    print(f"Error fetching data from API: {e}")
    exit()

*   api_url: This variable stores the URL of the Open Library API endpoint. It's designed to search for books related to "The Lord of the Rings."
*   requests.get(api_url): This line sends a GET request to the API to fetch data.
*   response.raise_for_status(): This checks if the request was successful (status code 200). If not, it raises an exception.
*   data = response.json(): This line converts the API response (which is in JSON format) into a Python dictionary called data.
*  The try...except block handles potential errors during the API request, printing an error message and exiting if something goes wrong.

**2. Inspect Data Format and Schema**

In [None]:
print("Data type:", type(data))
print("Keys in the JSON response:", data.keys())

if 'docs' in data:
  print("Example of a single record schema:", data['docs'][0].keys() if data['docs'] else "No documents found.")
else:
    print("No 'docs' key in the response. Check the API response format.")

Data type: <class 'dict'>
Keys in the JSON response: dict_keys(['numFound', 'start', 'numFoundExact', 'docs', 'num_found', 'q', 'offset'])
Example of a single record schema: dict_keys(['author_alternative_name', 'author_key', 'author_name', 'contributor', 'cover_edition_key', 'cover_i', 'ddc', 'ebook_access', 'ebook_count_i', 'edition_count', 'edition_key', 'first_publish_year', 'first_sentence', 'format', 'has_fulltext', 'ia', 'ia_collection', 'ia_collection_s', 'isbn', 'key', 'language', 'last_modified_i', 'lcc', 'lccn', 'lending_edition_s', 'lending_identifier_s', 'number_of_pages_median', 'oclc', 'osp_count', 'printdisabled_s', 'public_scan_b', 'publish_date', 'publish_place', 'publish_year', 'publisher', 'seed', 'title', 'title_sort', 'title_suggest', 'type', 'id_goodreads', 'id_librarything', 'id_dnb', 'id_doi', 'id_amazon', 'id_depósito_legal', 'id_alibris_id', 'id_google', 'id_paperback_swap', 'id_wikidata', 'id_better_world_books', 'id_overdrive', 'id_canadian_national_library



*   print("Data type:", type(data)): This line displays the data type of the data variable, which should be a dictionary.

*   print("Keys in the JSON response:", data.keys()): This line prints the top-level keys within the JSON data to understand its structure.


*   The if statement checks if the response contains a key called docs. If it does, it prints the keys of the first record within the docs list to show the schema of individual data entries. If the 'docs' key is not present or contains an empty list, a message would be displayed.





**3. Convert to DataFrame and Analyze**

In [None]:
try:
    df = pd.json_normalize(data['docs'])

    print("\nFirst 5 rows of the DataFrame:")
    print(df.head())

    print("\nDataFrame info:")
    print(df.info())

    print("\nDescriptive statistics:")
    print(df.describe(include='all'))

except KeyError as e:
    print(f"Error: Key '{e}' not found in the API response. Data format may not be suitable for direct DataFrame conversion.")
    print("Raw API Response:")
    print(json.dumps(data, indent=2))
except Exception as e:
  print(f"An error occurred during dataframe creation: {e}")


First 5 rows of the DataFrame:
                             author_alternative_name  author_key  \
0  [J R R Tolkien, John Ronald Reuel Tolkien, Dzh...  [OL26320A]   
1  [Dzhon R. R. Tolkin, J. R.R. Tolkien, Tolkien,...  [OL26320A]   
2  [Yue Han Luo Na De Rui Er Tuo Er Jin, J R R To...  [OL26320A]   
3  [Yue Han Luo Na De Rui Er Tuo Er Jin, J R R To...  [OL26320A]   
4  [J R R Tolkien, J.R.R.Tolkien, John Ronald Reu...  [OL26320A]   

        author_name                                        contributor  \
0  [J.R.R. Tolkien]  [Kořínek, Otakar, 1946-, Tolkien, J. R. R. 189...   
1  [J.R.R. Tolkien]  [Lee, Alan., Ipek, Çigdem Erkal., Tolkien, J....   
2  [J.R.R. Tolkien]  [Alan Lee (Illustrator), Grathmer, Ingahild., ...   
3  [J.R.R. Tolkien]  [Matilde Horne (Translator), Grathmer, Ingahil...   
4  [J.R.R. Tolkien]                                                NaN   

  cover_edition_key     cover_i  \
0       OL51694024M  14625765.0   
1       OL51708686M  14627060.0   
2       O



*   df = pd.json_normalize(data['docs']): This attempts to convert the 'docs' section of the JSON data (which likely contains a list of book records) into a pandas DataFrame. A DataFrame is a table-like structure that's easier to analyze.

*   print(df.head()): This displays the first 5 rows of the DataFrame.

*   print(df.info()): This provides information about the DataFrame, such as the data types of columns and the number of non-null values.
*  print(df.describe(include='all')): This calculates and displays descriptive statistics (like mean, standard deviation, etc.) for the DataFrame columns.


*   The try...except block handles potential errors during the DataFrame creation and analysis. If a KeyError occurs (e.g., if the 'docs' key is missing), it prints an error message






**Another simple Example**

In [None]:
import requests
import pandas as pd
import sqlite3

# Step 1: Extract Data
API_KEY = '872a7b3e18b34056070a7c4724e4f152'
CITY = 'Gaza'
BASE_URL = f'http://api.openweathermap.org/data/2.5/weather?q={CITY}&appid={API_KEY}&units=metric'

response = requests.get(BASE_URL)
data = response.json()

# Step 2: Transform Data
# Checking the schema and extracting relevant fields
weather_data = {
    'City': data['name'],
    'Temperature': data['main']['temp'],
    'Weather': data['weather'][0]['description'],
    'Humidity': data['main']['humidity'],
    'Pressure': data['main']['pressure'],
    'Wind Speed': data['wind']['speed']
}

# Converting to DataFrame
df = pd.DataFrame([weather_data])

# Step 3: Load Data
# Save to SQLite database
conn = sqlite3.connect('weather_data.db')
df.to_sql('Weather', conn, if_exists='replace', index=False)

# Display the DataFrame
print("Data Loaded Successfully!")
print(df)