## Technical Questions

1. Data Modeling and Warehousing
Question: How do you design a schema for a new data-intensive application?
To design a schema for a new data-intensive application, I would first need to understand the business requirements and use cases. Then I would analyze the expected data types, relationships and queries. With that information, I would choose an appropriate data model, such as relational, NoSQL, etc. I would also consider requirements such as scalability, performance and cost.

2. ETL/ELT Design and Implementation
Question: Describe a complex ETL pipeline you've designed. What were some of the challenges, and how did you address them?
I created an ETL with Python (Pandas, Colab) to generate a critical report for the company from lots of files about project initiatives with a level of risk for the company from all the countries where the company was located. Due to that wasn't measured and visible, the company had to pay a lot of money in fines, so I made the company save a lot of money. It was a challenge because I only had a free layer on Google Colab, so I had to optimize the code in order to meet the requirement successfully.

3. Data Infrastructure and Orchestration
Question: What experience do you have with managing data infrastructure on-premises or in the cloud?
I have experience managing data infrastructure both on-premises and in the cloud. On-premises, I've worked with traditional database servers, data warehouses, etc.. In the cloud, I have experience with AWS and Google Cloud Platform services, such as S3 for storage, BigQuery for databases, as well as NoSQL DataBases. However, I consider myself to have basic experience regarding infrastructure matters, In spite of this, I am a quick really good learner, I love learn new things and apply immediately.

4. Programming and Software Engineering Practices
Question: What programming languages are you most comfortable with, and what libraries do you frequently use in data engineering projects?
I am most comfortable with Python and SQL for data engineering projects. In Python, I frequently use libraries such as Pandas for data manipulation, as well as Numpy. I have worked with other programming languages such as Java, C#, Progress 4GL, however Python is my main language.

5. Data Security and Compliance
Question: How do you implement security measures in your data engineering projects?
To implement security measures, it is important to follow best practices such as encryption of data in transit and at rest, role-based access control, secure authentication, and secure authentication. 

Please explain how you ussually follow CI/CD pipelines

I follow CI/CD pipelines by first setting up a version control system, such as Git, to manage code changes. I use tools like Jenkins, GitLab CI, or CircleCI to automate the build, test, and deployment processes. The pipeline typically includes steps for code linting, unit testing, integration testing, and deployment to staging and production environments. I use Docker for containerization and Kubernetes for orchestration to ensure consistent environments across development and production.

## Code Challenge SQL

Code Challenge Description
Title: Building a MySQL Database Interface in Python

Objective:
You are tasked with creating a Python application that interfaces with a MySQL database. The application will manage a dataset representing sales data for a tech company that sells various products across multiple countries. Your goal is to establish a database connection, create a table, and populate this table with sample data.

Tasks:

Create a Database Connection:
Implement a Python function to establish a connection to a MySQL database using provided credentials (host, username, password, and database name).
Define and Create a Table:
Write SQL commands within your Python script to create a table named sales. This table should have columns for id, country, category, price, quantity, and final_sales, with appropriate data types.
Insert Data:
Prepare a series of SQL INSERT statements to populate the sales table with the provided sample data. Ensure each record accurately reflects the sales data format.
Execute Queries:
Write functions to execute SQL queries to create the table and insert data into the table. Include error handling to manage potential SQL execution errors.
Expected Deliverables:

A Python script that can be run to connect to a MySQL database, create the necessary table, and populate it with data.
Your script should handle common errors that might occur during database operations, such as connection failures or SQL syntax errors.
Evaluation Criteria:

Correctness: The script should correctly execute all database operations without errors.

Code Quality: Code should be clear, well-organized, and appropriately commented.

Error Handling: The script should effectively handle and report errors during database operations.

Efficiency: SQL operations should be written efficiently to optimize execution.

Setup Instructions
Just use mysql local community server and a made up data set related to sales of devices in a tech company 
https://dev.mysql.com/downloads/mysql/

Table 1 Sales 
    product_id ,
    country ,
    category ,
    price ,
    quantity ,
    final_sales
    
Table 2 Product
    id  PRIMARY KEY,
    category ,
    capacity ,
    color ,
    screen_size ,
    memory ,
    other_specs .
Ensure you have mysql-connector-python installed in your environment. If not, you can install it using pip install mysql-connector-python.

Tips for Success
Test each part of your script incrementally to ensure that each function behaves as expected.
Consider the edge cases, such as what happens if the table already exists or the database connection cannot be established.

This challenge is designed to test your ability to integrate Python programming with SQL database management, reflecting tasks you may handle as a data engineer in our organization. Good luck!


In [None]:
!pip install mysql-connector-python

## Create a MySQL Database using Python
### First, you need to connect to your MySQL server and create a new database.


In [None]:
import mysql.connector
from mysql.connector import Error

# Function to establish database connection
def create_connection(host_name, user_name, user_password, db_name):
    connection = None
    try:
        connection = mysql.connector.connect(
            host=host_name,
            user=user_name,
            passwd=user_password,
            database=db_name
        )
        print("Connection to MySQL DB successful")
    except Error as e:
        print(f"The error '{e}' occurred")
    return connection

# Function to execute SQL queries
def execute_query(connection, query):
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        connection.commit()
        print("Query executed successfully")
    except Error as e:
        print(f"The error '{e}' occurred")

# Define the database credentials
host = "localhost"
user = "root"
password = "25789Mysql."
database = "sales_db"

# Create a database connection
connection = create_connection(host, user, password, database)

In [None]:
# Define the create table query
create_sales_table = """
CREATE TABLE IF NOT EXISTS sales (
    id INT AUTO_INCREMENT,
    country VARCHAR(255) NOT NULL,
    category VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    quantity INT NOT NULL,
    final_sales DECIMAL(10, 2) NOT NULL,
    PRIMARY KEY (id)
);
"""

# Execute the create table query
execute_query(connection, create_sales_table)

## Insert Data into the sales Table
### The following Python script shows how to insert data into the sales table. We'll be adding rows using a batch insert for efficiency.

In [None]:
# Sample data to insert into the sales table
sales_data = """
INSERT INTO sales (country, category, price, quantity, final_sales)
VALUES
  ('USA', 'Electronics', 249.99, 15, 3749.85),
  ('Canada', 'Books', 19.95, 40, 798.00),
  ('Germany', 'Clothing', 89.99, 22, 1979.78),
  ('France', 'Home Decor', 59.95, 28, 1678.60),
  ('Japan', 'Toys', 24.99, 35, 874.65),
  ('Australia', 'Sports', 99.00, 18, 1782.00),
  ('Brazil', 'Electronics', 179.99, 12, 2159.88),
  ('Spain', 'Furniture', 399.00, 5, 1995.00),
  ('China', 'Books', 9.99, 65, 649.35),
  ('Italy', 'Clothing', 119.99, 16, 1919.84),
  ('UK', 'Home Decor', 79.95, 20, 1599.00),
  ('Mexico', 'Toys', 39.99, 30, 1199.70),
  ('India', 'Sports', 49.99, 25, 1249.75),
  ('Russia', 'Electronics', 299.99, 8, 2399.92),
  ('South Africa', 'Furniture', 249.99, 7, 1749.93),
  ('Turkey', 'Books', 14.99, 45, 674.55),
  ('Saudi Arabia', 'Clothing', 99.99, 18, 1799.82),
  ('Netherlands', 'Home Decor', 44.95, 32, 1438.40),
  ('Belgium', 'Toys', 29.99, 40, 1199.60),
  ('Switzerland', 'Electronics', 399.99, 6, 2399.94),
  ('Sweden', 'Furniture', 299.99, 9, 2699.91),
  ('Norway', 'Books', 24.95, 38, 948.10),
  ('Denmark', 'Clothing', 79.99, 24, 1919.76),
  ('Finland', 'Home Decor', 69.95, 26, 1818.70),
  ('Poland', 'Sports', 89.99, 20, 1799.80);
"""

# Execute the insert data query
execute_query(connection, sales_data)

## Code to Create the product Table
### First, here's the SQL command to create the product table with various specifications:

In [None]:
create_product_table = """
CREATE TABLE IF NOT EXISTS product (
    product_id INT AUTO_INCREMENT,
    product_name VARCHAR(255) NOT NULL,
    category VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    specifications TEXT NOT NULL,
    country_of_origin VARCHAR(255) NOT NULL,
    PRIMARY KEY (product_id)
);
"""

## Python Code to Execute the Table Creation and Insert Data
### Now, let's integrate this into your Python script to create the table and then populate it with some sample data(please just create dummy data ):

In [None]:
# Execute the create table query
execute_query(connection, create_product_table)

# Sample data to insert into the product table
product_data = """
INSERT INTO product (product_name, category, price, specifications, country_of_origin)
VALUES
   ('Laptop', 'Electronics', 1000.00, '16GB RAM, 512GB SSD', 'USA'),
   ('Sofa', 'Furniture', 500.00, 'Leather, Brown', 'Canada'),
   ('Toy Car', 'Toys', 20.00, 'Plastic, Red', 'Germany'),
   ('Smartphone', 'Electronics', 800.00, '6GB RAM, 128GB Storage', 'South Korea'),
   ('Dining Table', 'Furniture', 300.00, 'Wood, Rectangular', 'Italy'),
   ('Board Game', 'Toys', 30.00, 'Cardboard, Family-friendly', 'USA'),
   ('Headphones', 'Electronics', 100.00, 'Noise-cancelling, Wireless', 'Japan'),
   ('Armchair', 'Furniture', 200.00, 'Fabric, Blue', 'Spain'),
   ('Puzzle', 'Toys', 15.00, '1000 pieces, Landscape', 'Germany'),
   ('Tablet', 'Electronics', 600.00, '10.5-inch display, 64GB Storage', 'China'),
   ('Coffee Table', 'Furniture', 150.00, 'Glass top, Metal frame', 'USA'),
   ('Stuffed Animal', 'Toys', 10.00, 'Plush, Teddy Bear', 'Mexico'),
   ('Television', 'Electronics', 1200.00, '55-inch, 4K Smart TV', 'South Korea'),
   ('Bed Frame', 'Furniture', 400.00, 'Metal, Queen size', 'Canada'),
   ('Action Figure', 'Toys', 25.00, 'Plastic, Superhero', 'USA'),
   ('Laptop Bag', 'Electronics', 50.00, 'Nylon, Water-resistant', 'Taiwan'),
   ('Dining Chairs', 'Furniture', 80.00, 'Wood, Set of 4', 'Italy'),
   ('Lego Set', 'Toys', 40.00, '500 pieces, Space theme', 'Denmark'),
   ('Smart Watch', 'Electronics', 300.00, 'Fitness tracking, Waterproof', 'USA'),
   ('Bookshelf', 'Furniture', 120.00, 'Wood, 5 shelves', 'Poland'),
   ('Dollhouse', 'Toys', 60.00, 'Wood, Furnished', 'Germany'),
   ('Wireless Speaker', 'Electronics', 80.00, 'Bluetooth, Portable', 'China'),
   ('Recliner', 'Furniture', 350.00, 'Leather, Power lift', 'Mexico'),
   ('Drones', 'Toys', 100.00, 'Remote controlled, HD Camera', 'USA'),
   ('Laptop Cooling Pad', 'Electronics', 30.00, 'Adjustable, USB-powered', 'Taiwan');
"""

In [None]:
# Execute the insert data query
execute_query(connection, product_data)

## Use Case: Detailed Sales Analysis
### Objective:

Determine the top-selling product categories in each country.
Retrieve detailed product specifications for these top-selling products.
Provide additional insights like the total number of distinct products sold and the maximum sales recorded for each category.

In [None]:
# Function to perform detailed sales analysis
def detailed_sales_analysis(connection):
    query = """
    SELECT
        s.country,
        s.category,
        COUNT(DISTINCT p.product_id) AS distinct_products_sold,
        MAX(s.final_sales) AS max_sales,
        GROUP_CONCAT(p.product_name) AS product_names,
        GROUP_CONCAT(p.specifications) AS specifications
    FROM
        sales s
    JOIN
        product p ON s.category = p.category
    GROUP BY
        s.country, s.category
    ORDER BY
        max_sales DESC;
    """
    cursor = connection.cursor()
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        for row in result:
            print(f"Country: {row[0]}, Category: {row[1]}, Distinct Products Sold: {row[2]}, Max Sales: {row[3]}, Product Names: {row[4]}, Specifications: {row[5]}")
    except Error as e:
        print(f"The error '{e}' occurred")

# Perform detailed sales analysis
detailed_sales_analysis(connection)


## Code Chanllenge ETL on Python
### ETL Code Challenge Description
#### Title: ETL Process Simulation for Tech Company Sales Data

Objective:
Develop a Python-based ETL (Extract, Transform, Load) process that integrates data from multiple sources, applies specific transformations, and then loads the transformed data into a new table. This challenge tests your ability to handle data programmatically, showcasing your skills in data manipulation, SQL integration, and Python programming.

Background:
A tech company has multiple tables storing sales and product details. The sales table records transactions including the country, product category, and sales details. The product table includes specifications like capacity and color. Your task is to extract data from these tables, apply transformations to derive new insights, and load the results into a new structured format.

Tasks:

Extract:
Write a Python function to retrieve data from the existing sales and product tables. The extracted data should include country, category, product capacity, color, quantity sold, and final sales amount.
Transform:
Implement transformations to calculate the total revenue for each product (defined as quantity * final_sales).
Categorize each transaction based on sales volume into 'High', 'Medium', or 'Low'.
Load:
Design and create a new table called transformed_sales to store the transformed data.
Load the transformed data into this table with appropriate field names and data types.
Expected Deliverables:

A Python script that implements the ETL process.
The script should include functions for connecting to a MySQL database, executing SQL queries, and handling any potential errors.
Documentation within the script explaining the purpose and functionality of each part of the code.
Evaluation Criteria:

Correctness: The script should correctly execute all steps of the ETL process without errors.
Efficiency: Code and queries should be optimized for performance, especially when handling large datasets.
Code Quality: The code should be well-organized, properly commented, and easy to read.
Error Handling: The script should include robust error handling to manage and log potential issues during the database operations.
Instructions for Execution:

Just use mysql local community server and a made up data set related to sales of devices in a tech company and tables created on SQL portion.



Overview of the ETL Process
Here's how we can structure the ETL process for your dataset:

Extract: Retrieve data from the sales and product tables.
Transform: Apply transformations to the data, such as computing additional metrics or modifying the format.
Load: Load the transformed data into a new table or update the existing tables.
1. Extract Data
First, extract data from the MySQL database using the previously established connection and query functions.

In [None]:
# Function to fetch data from SQL
def fetch_query(connection, query):
    cursor = connection.cursor(dictionary=True)
    try:
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    except Error as e:
        print(f"The error '{e}' occurred")
        return None
    
def extract_data(connection):
    sales_query = """
    SELECT
        s.country,
        s.category,
        p.product_name,
        p.specifications,
        s.quantity,
        s.final_sales
    FROM
        sales s
    JOIN
        product p ON s.category = p.category;
    """
    sales_data = fetch_query(connection, sales_query)
    return sales_data


2. Transform Data
We will create a simple transformation function that, for example, calculates the total revenue per product and categorizes sales based on volume.

In [None]:
def transform_data(sales_data):
    transformed_data = []
    for row in sales_data:
        total_revenue = row['quantity'] * row['final_sales']
        if total_revenue > 1000:
            sales_volume = 'High'
        elif total_revenue > 500:
            sales_volume = 'Medium'
        else:
            sales_volume = 'Low'
        
        transformed_row = {
            "country": row['country'],
            "category": row['category'],
            "product_name": row['product_name'],
            "specifications": row['specifications'],
            "quantity_sold": row['quantity'],
            "total_revenue": total_revenue,
            "sales_volume": sales_volume
        }
        transformed_data.append(transformed_row)
    return transformed_data


3. Load Data
Finally, write the transformed data back into a new table or an existing one. Here, let's assume we are creating a new table to store these results.

In [None]:
def load_data(connection, transformed_data):
    create_transformed_sales_table = """
    CREATE TABLE IF NOT EXISTS transformed_sales (
        id INT AUTO_INCREMENT,
        country VARCHAR(255) NOT NULL,
        category VARCHAR(255) NOT NULL,
        product_name VARCHAR(255) NOT NULL,
        specifications TEXT NOT NULL,
        quantity_sold INT NOT NULL,
        total_revenue DECIMAL(10, 2) NOT NULL,
        sales_volume VARCHAR(50) NOT NULL,
        PRIMARY KEY (id)
    );
    """
    execute_query(connection, create_transformed_sales_table)
    
    insert_query = """
    INSERT INTO transformed_sales (country, category, product_name, specifications, quantity_sold, total_revenue, sales_volume)
    VALUES (%s, %s, %s, %s, %s, %s, %s);
    """
    cursor = connection.cursor()
    for row in transformed_data:
        cursor.execute(insert_query, (row['country'], row['category'], row['product_name'], row['specifications'], row['quantity_sold'], row['total_revenue'], row['sales_volume']))
    connection.commit()
    print("Transformed data loaded into transformed_sales table")


Execution of ETL Process
Now, combine these functions to perform the complete ETL process.

In [None]:
def run_etl_process():
    # Define the database credentials
    host = "localhost"
    user = "root"
    password = "25789Mysql."
    database = "sales_db"

    # Create a database connection
    connection = create_connection(host, user, password, database)
    
    if connection:
        # Extract data
        sales_data = extract_data(connection)
        
        # Transform data
        transformed_data = transform_data(sales_data)
        
        # Load data
        load_data(connection, transformed_data)

if __name__ == "__main__":
    run_etl_process()


Conclusion
This simulated ETL process in Python effectively demonstrates how to extract data from a relational database, apply meaningful transformations, and then load the processed data into a new storage system, providing practical hands-on experience with ETL concepts.

## Code Chanllenge Airflow

Title: Developing an Airflow DAG for an Automated ETL Process

Objective:
The goal of this challenge is to develop a fully functional Airflow Directed Acyclic Graph (DAG) that orchestrates an ETL (Extract, Transform, Load) process. This process involves extracting data from a source, transforming this data, and loading it into a destination system.

Background:
Automating ETL tasks is crucial for ensuring data accuracy and availability in real-time or near-real-time for analysis and decision-making. Airflow is a platform used to programmatically author, schedule, and monitor workflows.

Tasks:

Complete the Python Functions:
Extract Function: Implement logic to extract data from a predefined data source. This could be a database, a file, an API, or any simulated data source.
Transform Function: Apply necessary data transformations which could include cleaning, aggregating, or any other form of data manipulation.
Load Function: Implement the logic to load the transformed data into a specified target, which could be a database or a data warehouse.
Integrate Functions with Airflow:
Use the provided Airflow DAG skeleton to integrate your Python functions.
Configure the DAG to ensure that tasks are executed in the correct order, handling dependencies correctly.
Expected Deliverables:

A fully implemented Airflow DAG named etl_process_dag with the specified tasks (extract, transform, load).
Detailed documentation on:
The data source and data format expected.
The specific transformations applied.
The destination system and data schema.
Evaluation Criteria:

Functionality: The DAG should execute without errors, and data should flow through the ETL process as intended.
Code Quality: Code should be clean, well-commented, and follow best practices for Python and Airflow.
Error Handling: Adequate error handling should be in place to manage common failures in data extraction, transformation, and loading.
Scalability and Maintainability: The solution should be scalable and easy to maintain or modify.


In [None]:
import mysql.connector
from mysql.connector import Error

def extract_data():
    try:
        connection = mysql.connector.connect(
            host='localhost',
            database='sales_db',
            user='root',
            password='25789Mysql.'
        )
        if connection.is_connected():
            cursor = connection.cursor(dictionary=True)
            cursor.execute("SELECT * FROM sales")
            sales_data = cursor.fetchall()
            cursor.execute("SELECT * FROM product")
            product_data = cursor.fetchall()
        return sales_data, product_data
    except Error as e:
        print(f"Error: {e}")
        return None, None
    finally:
        if connection.is_connected():
            cursor.close()
            connection.close()


In [None]:
def transform_data(sales_data, product_data):
    transformed_data = []
    for sale in sales_data:
        for product in product_data:
            if sale['category'] == product['category']:
                total_revenue = sale['quantity'] * sale['final_sales']
                if total_revenue > 1000:
                    sales_volume = 'High'
                elif total_revenue > 500:
                    sales_volume = 'Medium'
                else:
                    sales_volume = 'Low'
                
                transformed_row = {
                    "country": sale['country'],
                    "category": sale['category'],
                    "product_name": product['product_name'],
                    "specifications": product['specifications'],
                    "quantity_sold": sale['quantity'],
                    "total_revenue": total_revenue,
                    "sales_volume": sales_volume
                }
                transformed_data.append(transformed_row)
    return transformed_data


In [None]:
!pip install apache-airflow==2.0.0
!pip install apache-airflow[postgres, mysql]


In [None]:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.utils.dates import days_ago

# We define the DAG
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
}

dag = DAG(
    'etl_process_dag',
    default_args=default_args,
    description='A simple ETL DAG',
    schedule_interval='@daily',
    start_date=days_ago(1),
)

# We define the tasks using PythonOperator
def extract_task(**kwargs):
    sales_data, product_data = extract_data()
    kwargs['ti'].xcom_push(key='sales_data', value=sales_data)
    kwargs['ti'].xcom_push(key='product_data', value=product_data)

def transform_task(**kwargs):
    ti = kwargs['ti']
    sales_data = ti.xcom_pull(key='sales_data', task_ids='extract')
    product_data = ti.xcom_pull(key='product_data', task_ids='extract')
    transformed_data = transform_data(sales_data, product_data)
    ti.xcom_push(key='transformed_data', value=transformed_data)

def load_task(**kwargs):
    ti = kwargs['ti']
    transformed_data = ti.xcom_pull(key='transformed_data', task_ids='transform')
    load_data(transformed_data)

extract = PythonOperator(
    task_id='extract',
    python_callable=extract_task,
    provide_context=True,
    dag=dag,
)

transform = PythonOperator(
    task_id='transform',
    python_callable=transform_task,
    provide_context=True,
    dag=dag,
)

load = PythonOperator(
    task_id='load',
    python_callable=load_task,
    provide_context=True,
    dag=dag,
)

# We define the dependencies
extract >> transform >> load


## Code Chanllenge API

API Data Integration Code Challenge Description
Title: API Data Handling and Integration Challenge

Objective:
Develop a Python application that fetches data from an external API, applies specified transformations, and outputs the processed data. This challenge is designed to test your abilities in API interaction, data manipulation, and the application of basic data processing principles in Python.

Background:
APIs are a crucial data source in many software systems and data pipelines. Effective handling and integration of API data are key skills for developers and data engineers, involving tasks such as data extraction, transformation, and preparation for further analysis or storage.

Tasks:

Extract Data:
Write a Python function to fetch data from a given API endpoint. This function should handle network errors, API rate limits, and other common issues that can occur during API interaction.
Transform Data:
Implement logic to transform the raw data fetched from the API. Assume the data includes various product details; extract and format this data into a structured JSON format that focuses on specific fields like product_id, product_name, category, and price.
Output Data:
Instead of loading the data into a database or storage system, output the transformed data to the console or a file in a clean, readable format. This simulates the final step in an ETL process where data is made available for further use.
Expected Deliverables:

A Python script that efficiently and correctly extracts, transforms, and outputs data as described.
Effective use of exception handling to manage potential errors during the API request.
Logging throughout the process to track operations and facilitate debugging and monitoring.
Evaluation Criteria:

Correctness and Completeness: The script should correctly fetch and process the API data according to the specifications provided.
Error Handling: Robust handling of errors and exceptional conditions in the API interaction.
Code Quality: The code should be clean, well-organized, commented, and follow best practices for Python development.
Output Formatting: The transformed data should be outputted in a structured and readable format, demonstrating an understanding of data presentation.
Instructions for Setup and Execution:

Ensure the requests and logging libraries are installed in your Python environment.
Use any public API endpoint of your preference if tech_company related would be better.



In [None]:
import requests
import logging
import json

# Configurar logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Función para extraer datos de la API
def fetch_api_data(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Check for HTTP errors
        data = response.json()
        logging.info("Data fetched successfully from API.")
        return data
    except requests.exceptions.RequestException as e:
        logging.error(f"Error fetching data from API: {e}")
        return None

# Función para transformar datos
def transform_data(data):
    transformed_data = []
    for item in data:
        transformed_item = {
            "product_id": item.get("id"),
            "product_name": item.get("name"),
            "category": item.get("category"),
            "price": item.get("price")
        }
        transformed_data.append(transformed_item)
    logging.info("Data transformed successfully.")
    return transformed_data

# Función para salida de datos
def output_data(data, output_file):
    try:
        with open(output_file, 'w') as file:
            json.dump(data, file, indent=4)
        logging.info(f"Data successfully written to {output_file}")
    except IOError as e:
        logging.error(f"Error writing data to file: {e}")

# Función principal
def main():
    api_url = "https://fakestoreapi.com/products"  # URL de ejemplo de API pública
    output_file = "transformed_data.json"

    # Extraer datos de la API
    raw_data = fetch_api_data(api_url)
    if raw_data:
        # Transformar datos
        transformed_data = transform_data(raw_data)
        # Salida de datos
        output_data(transformed_data, output_file)

if __name__ == "__main__":
    main()
