Bhavana Geereddy

Data Analysis Project Phase 1: Data Collection and MongoDB

As per the given instructions, below are the tasks we are going to accomplish in this Phase 1 of the Data Analysis Project. 


Task 1: Data collection
1. Written a Python script to download data from the Internet.
2. Ensure that the downloaded data is sufficient for meaningful data analysis in future projects while avoiding excessive data volume for MongoDB upload.
3. Provided information about the data source, including its origin, structure, time frame, and size.

Task 2: Write JSON data to MongoDB
1. Write the collected data to a MongoDB Atlas collection.
2. Validate the correctness of data insertion into MongoDB.
3. Save all data to a local text file in valid JSON format for reference and backup.

In [1]:
import requests
from pymongo import MongoClient

In [2]:
# Define the MongoDB connection string
mongo_uri = "mongodb+srv://bhavanareddygeereddy:eisEKtc2yQ2sJStN@cluster0.ctxrdss.mongodb.net/"

In [3]:
# Function to insert data into MongoDB
def insert_data_to_mongodb(data):
    try:
        # Connect to MongoDB
        client = MongoClient(mongo_uri)

        # Create/connect to the database and collection
        db = client['electric_vehicle_data']
        collection = db['vehicles']

        # Insert data into MongoDB
        collection.insert_many(data)

        print("Data successfully inserted into MongoDB.")

    except Exception as e:
        print("Error writing data to MongoDB:", e)
        print("Failed to write data to MongoDB.")

In [4]:
# URL to download JSON data
json_url = 'https://data.wa.gov/api/views/f6w7-q2d2/rows.json?accessType=DOWNLOAD'

In [5]:
# Download JSON data from the URL
response = requests.get(json_url)
json_data = response.json()

In [6]:
# Extract the data from the JSON response
data = json_data['data'][:1000]  # Extracting only 1000 records 

In [7]:
# Extracting column names from the meta data
columns = [column['name'] for column in json_data['meta']['view']['columns']]

In [8]:
# Converting the nested data into a list of dictionaries
records = [dict(zip(columns, record)) for record in data]

In [9]:
# Insert the data into MongoDB
insert_data_to_mongodb(records)

Data successfully inserted into MongoDB.


Code to save all data to a local text file in valid JSON format for reference and backup.

In [10]:
import json
import requests

In [11]:
# Function to download JSON data from the specified URL
def download_json_data(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        print("Failed to download data from the specified URL.")
        return None

In [12]:
# Function to write JSON data to a local file
def write_to_file(data, filename):
    with open(filename, 'w') as file:
        json.dump(data, file, indent=4)

In [13]:
# URL to download JSON data
url = "https://data.wa.gov/api/views/f6w7-q2d2/rows.json?accessType=DOWNLOAD"


In [14]:
# Download JSON data
json_data = download_json_data(url)

In [16]:
if json_data:
    # Extract records from the loaded data, limiting to 1000 records
    records = json_data['data'][:1000]


    # Write the data to a local file
    write_to_file(records, 'electric_vehicle_data.json')

    print("Data successfully downloaded, and written to a local file.")
    print("Basic Information about the Data:")
    print("Data Source: Washington State Department of Licensing")
    print("Data Attributes:")
    for column in json_data['meta']['view']['columns']:
        column_name = column.get('name', 'Unknown')
        column_description = column.get('description', 'No description available')
        print(f"- {column_name}: {column_description}")
    print("Time Frame: Data includes Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) registered as of December 31, 2023.")
    print(f"Data Size: {len(records)} records")
else:
    print("Failed to download JSON data.")

Data successfully downloaded, and written to a local file.
Basic Information about the Data:
Data Source: Washington State Department of Licensing
Data Attributes:
- sid: No description available
- id: No description available
- position: No description available
- created_at: No description available
- created_meta: No description available
- updated_at: No description available
- updated_meta: No description available
- meta: No description available
- VIN (1-10): The 1st 10 characters of each vehicle's Vehicle Identification Number (VIN).
- County: This is the geographic region of a state that a vehicle's owner is listed to reside within. Vehicles registered in Washington state may be located in other states.
- City: The city in which the registered owner resides.
- State: This is the geographic region of the country associated with the record. These addresses may be located in other states.
- Postal Code: The 5 digit zip code in which the registered owner resides.
- Model Year: The