# Importing Libraries

## Purpose

Imports the required libraries to run the application, including:

- **`requests`**: For making HTTP requests to the OpenWeatherMap API.
- **`json`**: For parsing and handling JSON data.
- **`dotenv`**: For securely loading environment variables from a `.env` file.
- **`logging`**: For logging application events and errors.
- **`pandas`**: For data manipulation and saving to CSV.
- **`datetime`**: For handling and formatting dates and times.
- **`os`**: For interacting with the operating system (e.g., file paths).
- **`tenacity`**: For retrying API requests in case of failures.
- **`sys`**: For system-specific parameters and functions.

## Details

- All libraries are imported at the start to ensure availability for all functions.
- The imports are organized with comments explaining their purpose, enhancing readability.


In [1]:
import requests  # For making HTTP requests to the OpenWeatherMap API
import json  # For parsing and handling JSON data
import dotenv  # For securely loading environment variables from a .env file
import logging  # For logging application events and errors
import pandas as pd  # For data manipulation and saving data to CSV
from datetime import datetime, timezone, timedelta  # For handling and formatting date and time
import os  # For interacting with the operating system (e.g., file paths, environment variables)
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type  # For retrying API requests
import sys  # For system-specific parameters and functions

# Logger Setup

## Purpose

Configures a logger to record application events and errors:

- Logs all messages to a file (`app.log`) for persistent records.
- Outputs warnings and errors to the console for immediate visibility.
- Supports dynamic log levels (e.g., `DEBUG`, `INFO`) via environment variables.

## Details

The `set_logger` function:

- Takes a **file name** and **log level** as optional parameters.
- Clears existing handlers to avoid duplicates.
- Sets up:
  - A **file handler** (logs all levels).
  - A **stream handler** (logs warnings and above).
- Uses a formatted log message with:
  - Timestamp
  - Log level
  - Function name
  - Message content
- Defaults to `INFO` level if the specified log level is invalid.
- Logs initialization details for traceability.


In [2]:
def set_logger(file_name: str = 'app.log', log_level: str = None) -> logging.Logger:
    """
    Configures and returns a logger for the application.

    - Logs messages to a file for persistent records.
    - Outputs warnings and errors to the console for immediate visibility.
    - Supports dynamic log levels (e.g., DEBUG, INFO, WARNING) via environment variables.

    Args:
        file_name (str): Name of the log file. Defaults to 'app.log'.
        log_level (str): Logging level (e.g., 'DEBUG', 'INFO'). Defaults to 'INFO' or env variable 'LOG_LEVEL'.

    Returns:
        logging.Logger: Configured logger instance.
    """
    # Create a logger with the module's name
    logger = logging.getLogger(__name__)
    logger.handlers.clear()  # Clear existing handlers to avoid duplicates

    # Set the logging level from argument or environment variable
    log_level = log_level or os.getenv('LOG_LEVEL', 'INFO').upper()
    try:
        logger.setLevel(getattr(logging, log_level))
    except AttributeError:
        logger.setLevel(logging.INFO)
        logger.warning(f"Invalid log level '{log_level}', defaulting to INFO")

    # Define a formatter for log messages
    formatter = logging.Formatter(
        '%(asctime)s - %(levelname)s - %(funcName)s - %(message)s',
        datefmt='%H:%M:%S'
    )

    # File handler: Logs all levels to a file
    file_handler = logging.FileHandler(file_name, mode='w', encoding='utf-8')
    file_handler.setFormatter(formatter)
    file_handler.setLevel(logging.DEBUG)

    # Stream handler: Outputs warnings and errors to the console
    stream_handler = logging.StreamHandler()
    stream_handler.setFormatter(formatter)
    stream_handler.setLevel(logging.WARNING)

    # Add handlers to the logger
    logger.addHandler(file_handler)
    logger.addHandler(stream_handler)

    logger.info(f"Logger initialized with level {log_level} and output to {file_name}")
    return logger

# Loading Environment Variables

## Purpose

Loads the OpenWeatherMap API key from a `.env` file to:

- Ensure secure management of sensitive data.
- Validate the API key format and log errors if invalid.

## Details

### Process Flow:

1. Uses `dotenv.load_dotenv()` to load variables from `.env`.
2. Retrieves the `app_id` (API key) from environment variables.

### Validation Checks:

- **Format**: Verifies the key is a non-empty string.
- **Length**: Confirms the key is 32 characters long.

### Error Handling:

- Raises `FileNotFoundError` if `.env` file is missing.
- Raises `ValueError` for invalid API key formats.

### Logging:

- Logs success or failure with descriptive messages for debugging.
- Includes warnings for missing/invalid keys.


In [3]:
def load_env_variables(logger: logging.Logger) -> str:
    """
    Loads the OpenWeatherMap API key from a .env file.

    - Ensures secure management of sensitive data.
    - Validates the API key format and logs errors if invalid.

    Args:
        logger (logging.Logger): Logger instance for logging events.

    Returns:
        str: The API key (app_id) from the environment variables.

    Raises:
        SystemExit: If the .env file is missing or the API key is invalid.
    """
    try:
        # Load environment variables from a .env file
        if not dotenv.load_dotenv():
            logger.error(".env file not found or could not be loaded")
            raise FileNotFoundError("Missing .env file")

        # Retrieve the API key from environment variables
        app_id = os.getenv('app_id')

        # Validate the API key
        if not app_id or not isinstance(app_id, str) or app_id.strip() == '' or len(app_id) != 32:
            logger.error("app_id is missing, empty, or invalid in environment variables")
            raise ValueError("Invalid API key format")

        logger.info("app_id loaded successfully")
        return app_id.strip()

    except Exception as e:
        logger.error(f"Unexpected error while loading environment variables: {e}")
        raise e('Unexpected error')

# Loading City List

## Purpose

Loads and processes city data for API consumption by:

- Reading from `city_list.json`
- Filtering cities by country code (default: `'US'`)
- Chunking results into groups of 20 for API compliance

## Details

### Input Requirements

- **Country Code**: 2-letter ISO format (e.g., `'US'`, `'GB'`)
- **JSON File Structure**:
  ```json
  [
    {
      "id": 123456,
      "name": "CityName",
      "country": "CC"
    }
  ]
  ```


In [4]:
def load_cities(logger: logging.Logger, country: str = 'US') -> list[list[dict]]:
    """
    Loads a list of cities from a JSON file and filters them by country.

    - Reads city data from a JSON file.
    - Filters cities by the specified country code.
    - Splits the list into chunks of 20 cities for API requests.

    Args:
        logger (logging.Logger): Logger instance for logging events.
        country (str): Country code to filter cities by (default is 'US').

    Returns:
        list[list[dict]]: A list of city chunks, each containing up to 20 city dictionaries.

    Raises:
        SystemExit: If the file is missing, JSON is invalid, or data is malformed.
    """
    try:
        # Validate country code
        if not isinstance(country, str) or len(country) != 2:
            logger.error(f"Invalid country code: {country}. Must be a 2-letter ISO code.")
            raise TypeError('Invalid country code')

        # Build the full path to the JSON file
        file_path = os.path.join('Data', 'city_list.json')

        # Open and parse the JSON file
        with open(file_path, 'r', encoding='utf-8') as f:
            cities_json = json.load(f)

        if not cities_json:
            logger.error("JSON file is empty")
            raise ValueError("Empty JSON file")

        # Filter cities by country and validate entries
        cities = []
        for city in cities_json:
            if not all(key in city for key in ['id', 'name', 'country']):
                logger.warning(f"Skipping invalid city entry: {city}")
                continue

            if city['country'] == country:
                cities.append(city)

        if not cities:
            logger.error(f"No valid cities found for country {country}")
            raise ValueError(f"No valid cities found for country {country}")

        logger.info(f"Loaded {len(cities)} unique {country} cities from the JSON file")

        # Split the list into chunks of 20 items each
        city_list_chunks = [cities[i:i+20] for i in range(0, len(cities), 20)]
        logger.debug(f"Created {len(city_list_chunks)} chunks of up to 20 cities each")
        return city_list_chunks

    except FileNotFoundError:
        logger.error(f"File not found: {file_path}")
        sys.exit(1)
    except json.JSONDecodeError as e:
        logger.error(f"Error decoding JSON: {e}")
        sys.exit(1)
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        sys.exit(1)

# Fetching Weather Data

## Purpose

Retrieves weather data from OpenWeatherMap API with robust error handling:

- Processes cities in efficient chunks (max 20 per API call)
- Implements automatic retries with exponential backoff
- Provides detailed progress tracking and error logging

## Implementation Details

### Core Functions

#### `_make_api_request(url: str) -> dict`

**Internal helper for resilient API calls**:

- ⚡ **Timeout**: 10-second request timeout
- 🔁 **Retry Logic** (via `tenacity`):
  - 3 max attempts
  - Triggers on:
    - Connection errors
    - Timeouts
    - HTTP 5XX/4XX errors
- 🚨 **Error Handling**:
  - Logs full error context
  - Raises exceptions for non-200 responses
- ✅ **Success**: Returns parsed JSON response

#### `fetch_weather_data(api_key: str, city_chunks: list) -> list[dict]`

**Main data fetching workflow**:

1. **Input Validation**:

   - Verifies API key presence
   - Confirms city chunks non-empty

2. **Chunk Processing**:
   ```python
   for chunk in city_chunks:
       city_ids = [str(city['id']) for city in chunk]
       url = f"https://api.openweathermap.org/data/2.5/group?id={','.join(city_ids)}&units=metric&appid={api_key}"
   ```


In [5]:
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
    retry=retry_if_exception_type((requests.ConnectionError, requests.Timeout, requests.HTTPError)),
    before_sleep=lambda retry_state: logging.getLogger(__name__).warning(
        f"Retrying API request (attempt {retry_state.attempt_number}/3) after {retry_state.idle_for:.2f}s")
)
def _make_api_request(url: str, params: dict, logger: logging.Logger) -> dict:
    """
    Helper function to make a single API request with error handling.

    - Retries failed requests up to 3 times with exponential backoff.
    - Logs errors for non-200 status codes or request exceptions.

    Args:
        url (str): API endpoint URL.
        params (dict): Query parameters for the API request.
        logger (logging.Logger): Logger instance for logging events.

    Returns:
        dict: JSON response from the API.
    """
    try:
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()  # Raise an exception for non-200 status codes
        return response.json()
    
    except requests.HTTPError as e:
        logger.error(f"HTTP error during API request: {e}")
        raise
    except requests.RequestException as e:
        logger.error(f"Request error during API request: {e}")
        raise
    except Exception as e:
        logger.error(f"Unexpected error during API request: {e}")
        raise

def fetch_weather_data(token: str, logger: logging.Logger, city_list: list[list[dict]]) -> list:
    """
    Fetches weather data for a list of city IDs using the OpenWeatherMap Group API.

    - Processes cities in chunks of up to 20.
    - Logs the number of cities fetched per chunk and handles errors gracefully.

    Args:
        token (str): OpenWeatherMap API key.
        logger (logging.Logger): Logger instance for logging events.
        city_list (list[list[dict]]): List of city chunks, each containing up to 20 city dictionaries.

    Returns:
        list: List of weather data dictionaries for all cities.
    """
    if not token or not city_list:
        logger.error("Invalid input: token or city_list is empty")
        return []

    cities_data = []
    api_url = 'http://api.openweathermap.org/data/2.5/group'

    for i, cities_block in enumerate(city_list, 1):
        if not cities_block:
            logger.warning(f"Skipping empty city chunk {i}")
            continue

        city_ids = [city['id'] for city in cities_block]
        logger.debug(f"Processing chunk {i}/{len(city_list)} with {len(city_ids)} cities: {city_ids[:5]}...")

        payload = {
            'id': ','.join(map(str, city_ids)),
            'appid': token,
            'units': 'metric'
        }

        try:
            response_data = _make_api_request(api_url, payload, logger)
            weather_data = response_data.get('list', [])

            if weather_data:
                cities_data.extend(weather_data)
                logger.info(f"Chunk {i}: Fetched weather data for {len(weather_data)} cities")
            else:
                logger.warning(f"Chunk {i}: No weather data in response")

        except Exception as e:
            logger.error(f"Chunk {i}: Failed to fetch weather data: {e}")
            continue

    logger.info(f"Total weather data fetched for {len(cities_data)} cities")
    return cities_data

# Formatting Local Time

## Purpose

Converts UNIX timestamps to human-readable local time by:

- Adjusting for timezone offsets
- Formatting as 12-hour `HH:MM AM/PM` strings

---

## Function: `format_local_time(timestamp: int, offset: int) -> str`

### **Input Validation**

- Ensures `timestamp` and `offset` are integers
- Logs warnings for invalid types/values

### **Conversion Process**

1. **UTC Baseline**:
   ```python
   utc_time = datetime.fromtimestamp(timestamp)  # Convert UNIX → UTC datetime
   ```


In [6]:
def format_time_to_local_time(timestamp: int, offset_seconds: int, logger: logging.Logger) -> str:
    """
    Converts a UNIX timestamp to local time based on a timezone offset.

    - Adjusts the timestamp for the timezone offset.
    - Formats the time as 'HH:MM AM/PM'.

    Args:
        timestamp (int): The UNIX timestamp.
        offset_seconds (int): The timezone offset in seconds.
        logger (logging.Logger): Logger instance for logging events.

    Returns:
        str: The formatted local time as a string, or 'N/A' on error.
    """
    try:
        if not isinstance(timestamp, int) or not isinstance(offset_seconds, int):
            logger.error(f"Invalid timestamp ({timestamp}) or offset ({offset_seconds})")
            return 'N/A'

        # Convert the timestamp to UTC datetime
        utc_time = datetime.fromtimestamp(timestamp, timezone.utc)

        # Apply the timezone offset to get the local time
        local_time = utc_time + timedelta(seconds=offset_seconds)

        # Round the time to the nearest minute
        rounded_time = local_time.replace(second=0, microsecond=0) + timedelta(seconds=30)

        # Format the time as a string in 'HH:MM AM/PM' format
        return rounded_time.strftime('%I:%M %p')

    except (ValueError, OSError) as e:
        logger.error(f"Error formatting time: {e}")
        return 'N/A'

# Saving Data to CSV

## Purpose

Persists weather data to `Weather_Data.csv` with:

- Structured attribute extraction
- Missing value handling
- Local time conversion for timestamps

---

## Function: `save_to_csv(weather_data: list[dict]) -> None`

### **Input Validation**

- Checks for non-empty `weather_data` list
- Raises `ValueError` if input is empty/malformed

### **Data Processing Pipeline**

1. **Attribute Extraction**:
   ```python
   processed_data = []
   for record in weather_data:
       processed_data.append({
           'city_id': record.get('id', 'N/A'),
           'city_name': record.get('name', 'N/A'),
           'temp_c': round(record['main']['temp'], 1),  # Celsius
           'humidity': record['main'].get('humidity', 'N/A'),
           'sunrise': format_local_time(record['sys']['sunrise'], record['timezone']),
           'sunset': format_local_time(record['sys']['sunset'], record['timezone']),
           'conditions': record['weather'][0]['description']
       })
   ```


In [7]:
def save_data_to_csv(data: list, logger: logging.Logger) -> None:
    """
    Saves weather data to a CSV file with detailed information for each city.

    - Extracts weather attributes into a structured format.
    - Saves the data to a CSV file named 'Weather Data.csv'.

    Args:
        data (list): List of weather data dictionaries.
        logger (logging.Logger): Logger instance for logging events.
    """
    try:
        if not data:
            logger.error("No weather data to save to CSV")
            return

        # Prepare the data for the DataFrame
        data_list = [
            {
                'City ID': city.get('id', 'N/A'),
                'City Name': city.get('name', 'N/A'),
                'Timezone': f"{city.get('sys', {}).get('timezone', 0) // 3600} UTC",
                'Weather Condition': city.get('weather', [{}])[0].get('main', 'N/A'),
                'Weather Description': city.get('weather', [{}])[0].get('description', 'N/A'),
                'Temperature': f"{city.get('main', {}).get('temp', 'N/A')} °C",
                'Feels Like': f"{city.get('main', {}).get('feels_like', 'N/A')} °C",
                'Max Temperature': f"{city.get('main', {}).get('temp_max', 'N/A')} °C",
                'Min Temperature': f"{city.get('main', {}).get('temp_min', 'N/A')} °C",
                'Humidity': f"{city.get('main', {}).get('humidity', 'N/A')}%",
                'Cloud Coverage': f"{city.get('clouds', {}).get('all', 'N/A')}%",
                'Visibility': f"{city.get('visibility', 'N/A')} m",
                'Wind Speed': f"{city.get('wind', {}).get('speed', 'N/A')} m/s",
                'Wind Direction': f"{city.get('wind', {}).get('deg', 'N/A')}°",
                'Atmospheric Pressure (hPa)': city.get('main', {}).get('pressure', 'N/A'),
                'Sea Level Pressure (hPa)': city.get('main', {}).get('sea_level', 'N/A'),
                'Ground Level Pressure (hPa)': city.get('main', {}).get('grnd_level', 'N/A'),
                'Sunrise Time (Local)': format_time_to_local_time(
                    city.get('sys', {}).get('sunrise', 0),
                    city.get('sys', {}).get('timezone', 0),
                    logger
                ),
                'Sunset Time (Local)': format_time_to_local_time(
                    city.get('sys', {}).get('sunset', 0),
                    city.get('sys', {}).get('timezone', 0),
                    logger
                ),
                'Latitude': city.get('coord', {}).get('lat', 'N/A'),
                'Longitude': city.get('coord', {}).get('lon', 'N/A'),
            }
            for city in data
        ]

        # Create a DataFrame and handle missing values
        df = pd.DataFrame(data_list)
        df.fillna('N/A', inplace=True)

        # Save the DataFrame to a CSV file
        output_file = 'Weather Data.csv'
        df.to_csv(output_file, index=False, encoding='utf-8')
        logger.info(f"Weather data successfully saved to '{output_file}' with {len(df)} records")

    except PermissionError:
        logger.error(f"Permission denied when writing to '{output_file}'. Check file permissions.")
    except Exception as e:
        logger.error(f"Unexpected error while saving data to CSV: {e}")

# Main Function and Execution

## Purpose

Orchestrates the end-to-end weather data pipeline:

1. **Initializes** logging and environment
2. **Loads** required configurations
3. **Fetches** and **processes** weather data
4. **Persists** results to CSV

---

## Function: `main() -> None`

### **Workflow Steps**

```mermaid
graph TD
    A[Start] --> B[Initialize Logger]
    B --> C[Load API Key]
    C --> D[Load City Chunks]
    D --> E[Fetch Weather Data]
    E --> F[Save to CSV]
    F --> G[Log Completion]
```


In [8]:
def main():
    """
    Main function to execute the weather data processing pipeline.

    - Initializes the logger.
    - Loads the API key from environment variables.
    - Loads and processes the city list.
    - Fetches weather data for the cities.
    - Saves the weather data to a CSV file.

    This function ties all components together into a cohesive workflow.
    """
    logger = set_logger()
    api_key = load_env_variables(logger)  # Securely load the API key
    city_chunks = load_cities(logger)  # Load and chunk the city list
    weather_data = fetch_weather_data(api_key, logger, city_chunks)  # Fetch weather data for the cities
    save_data_to_csv(weather_data, logger)  # Save the weather data to a CSV file

if __name__ == '__main__':
    main()