# Data Visualization Script

**Author**: Jeff Calderon  
**Date**: October 2, 2024  

### Description:
This script is designed to get statistics from U.S. Bureau of Labor Statistics API (www.bls.gov) and to visualize data trends over time using Python libraries such as Pandas and Matplotlib. It makes multiple plots. The process includes:

1. **Importing Necessary Libraries**: To handle data manipulation and visualization.
2. **Defining the Data**: A sample DataFrame containing Year, Month, and Value columns.
3. **Preprocessing**:
   - Mapping month names to numeric values.
   - Converting values to numeric types.
4. **Creating a Combined Date Column**: By merging Year and Month for accurate time-series plotting.
5. **Plotting Data**:
   - Generating scatter plots with appropriate axis labels, titles, and limits for clarity.

### Purpose:
To provide a clear and concise visualization of data points over time, enabling better analysis and interpretation of trends.
### Example Plot:
![Values Over Time](LookWhatICanDo.png)


In [None]:
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot
import seaborn as sns
import numpy as np
import random
from prettytable import PrettyTable


In [None]:
# Load the data type dictionary for use when ploting
with open('data_type_dict.json', 'r') as file:
    data_type_dict = json.load(file)

Going to need an API key and the URL to the server with the BLS Data on it


In [None]:
BLS_API_KEY = 'aa6acc3f370041248afd5ca7de81b2fa'
BLS_ENDPOINT = "https://api.bls.gov/publicAPI/v2/timeseries/data/"



    Pass in a list of BLS timeseries to fetch data and return the series
    in JSON format. Arguments:
        - startyear (4 digit year)
        - endyear (4 digit year)
        - registrationKey (api key from BLS website)
    If the registrationKey is not passed in, this function will use the
    BLS_API_KEY fetched from the environment.


In [None]:

def fetch_bls_series(series, BLS_API_KEY, startyear, endyear):
   
    if len(series) < 1 or len(series) > 25:
        raise ValueError("Must pass in between 1 and 25 series ids")
        
    # Create headers; tells server what type of info we are sending, in this json

    headers = {'Content-Type': 'application/json'}
    
    # payload post data, here we tell it what we want and our send the key: 
    payload = {
        'seriesid': series,
        'registrationKey': BLS_API_KEY,
    }
    
    # Update the payload with the keyword arguments and convert to JSON
    # payload.update(kwargs)
    payload = json.dumps(payload)
    # Fetch the response from the BLS API
    response = requests.post(BLS_ENDPOINT, data=payload, headers=headers)
    response.raise_for_status()
    # Parse the JSON result
    #result = response.json()
    result = json.loads(response.text)
    if result['status'] != 'REQUEST_SUCCEEDED':
        raise Exception(result['message'][0])
    return result

Series ID    FORMAT 
CE Prefix: National Employment, Hours, and Earnings
U Seasonal Adjustment Code
60 Super Sector: Professional and Business services
541512 industry: 	Custom computer programming services	

CEU605415120001 last two is data type add zeroes to make it 13 numbers



	Positions       Value           Field Name
	1-2             CE              Prefix
	3               U               Seasonal Adjustment Code
	4-11		08000000	Supersector and Industry Codes
	12-13           03              Data Type Code
    

In [None]:
series_dict = {
    'CEU6054150011': 'Computer systems design and related services weekly',
    'CEU6054151101': 'Custom computer programming services weekly',
    'CEU6054151201': 'Computer systems design services weekly',
    'CEU6054151301': 'Computer facilities management services weekly',
    'CEU6054151901': 'Other computer related services weekly',
    'CES6054150001': 'Computer systems design and related services employment',
    'CES6054151101': 'Custom computer programming services employment',
    'CES6054151201': 'Computer systems design services employment',
    'CES6054151301': 'Computer facilities management services employment',
    'CES6054151901': 'Other computer related services employment',
    'CEU6056100001' : 'Administrative and Support Services Employment',
    'CEU6056100006' : 'Administrative and Support Services Pay',
    'CEU8081210001' : 'Personal Care Services Employment',
    'CEU8081210006' : 'Personal Care Services Pay'
}
series_list = list(series_dict.keys()) # Take the series id and make a list of them

json_data = fetch_bls_series(series_list, BLS_API_KEY, 2015, 2024) 

# list for value pairs

In [None]:
# container for values to keep
data_list = []
for series in json_data['Results']['series']:
    if series['data']:
        series_num = series['seriesID']
        ind_name = series_dict[series['seriesID']]
        data_type = data_type_dict[series_num[-2:]] # check last two of seriesid 
        
        for entry in series['data']:
            # Convert 'year' and 'period' into a single 'Date' column (e.g., YYYY-MM)
            data_list.append( {
                'Series ID' : series_num,
                'Industry' : ind_name,
                'Year' : entry['year'],
                'Month' : entry['periodName'],
                'Type' : data_type,
                'Value' : entry['value']
            })
#create data frame 

We have our data. Now I can create data frame. I will need to map the months to numbers for the plot to work later. So I prep the data here

In [None]:

data_frame = pd.DataFrame(data_list)

# Map month names to numbers
month_map = {
    'January': 1,
    'February': 2,
    'March': 3,
    'April': 4,
    'May': 5,
    'June': 6,
    'July': 7,
    'August': 8,
    'September': 9,
    'October': 10,
    'November': 11,
    'December': 12
}
#combine month and year
data_frame['Month'] = data_frame['Month'].map(month_map)

data_frame['Date'] = pd.to_datetime(data_frame[['Year', 'Month']].assign(Day=1))

# Convert 'Value' column to numeric
data_frame['Value'] = pd.to_numeric(data_frame['Value'], errors='coerce')




In [None]:

# collect unique data types
all_types = data_frame['Type'].unique()
# Display the plot for each data type
for types in all_types:
    sub_df = data_frame[data_frame['Type'] == data_type]
    plt.figure(figsize=(12, 6))
    plt.scatter(sub_df['Date'], sub_df['Value'], marker='o', label=data_type)
    plt.title(f'{data_type} Over Years')
    plt.xlabel('Date')
    plt.ylabel('Value')
    plt.ylim(min(data_frame['Value'])-10, max(data_frame['Value'])+10)  # Adjust y-axis limits
    plt.grid(True)
    plt.legend()
    plt.show()