In [13]:
import numpy as np
import pandas as pd

## This notebook is used to extract data for latitude and longitude

FOLLOW THE STEPS BELOW:

Run the code cell below, it will prompt you to enter the file location. The file location can be a raw.githubusercontent.com or right click on the dataset and copy the file path, copy path.

In [14]:
#Option 1 the url is given a value and is not "None" therefore the defined function "load_data" will not prompt the user to enter location.
#url = "https://raw.githubusercontent.com/Chameleon-company/EVCFLO/main/datasets/T1_2023/New_Zealand/NZ_Public_EV_Charger_Data_2023-04-28%2000_05_06NZDT.csv"

url=""

#Option 2 the above url is "None" unlike the example in Option 1. The following code is a defined function, when run it will prompt you to enter a file location.

def load_data(url=None):
    if url is None:
        url = input("Please enter the URL to the CSV file: ")
    
    data = pd.read_csv(url)
    return data

data = load_data()

Run the code cell below, It will tell you the number of rows and columns as well as the column names and data types.

In [15]:
#Return the number of rows and the number fo columns from the data entered in the cell above.

print("This dataset has", data.shape[0], "rows and", data.shape[1], "columns.")

#Return the data types of each variable (column) from the data provided.

print("Columns & data types in the dataset:")
print(data.dtypes)

This dataset has 278 rows and 13 columns.
Columns & data types in the dataset:
Region              object
Locality            object
Operator            object
Owner               object
DC or AC            object
kW                   int64
Connectors          object
Num_installed        int64
Num_in_progress      int64
Address             object
Long               float64
Lat                float64
EECA funded         object
dtype: object


Run the code cell below, it will prompt you to enter the column for station name / location, latitude, and longitude. use the list above to pick the correct columns or open the CSV file and find the names for the right columns.

In [16]:
#Name of variable (column) representing station location
station = input("Enter the column name for Station_Location: ")

#Name of variable (column) representing Latitude
latitude = input("Enter the column name for Latitude: ")

#Name of variable (column) representing longitude
longitude = input("Enter the column name for Longitude: ")

#Create a new dataframe with the columns defined above
new_df = pd.DataFrame(data, columns=[station, latitude, longitude])

#Rename this new columns to the following
new_df = new_df.rename(columns={
    station: "Service_Station_Location",
    latitude: "Latitude",
    longitude: "Longitude"
})

#Return the number of rows and the number fo columns from the data entered in the cell above.
print("This dataset has", new_df.shape[0], "rows and", new_df.shape[1], "columns.")


#Return the data types of each variable (column) from the data provided.
print("Columns & data types in the dataset:")
print(new_df.dtypes)

This dataset has 278 rows and 3 columns.
Columns & data types in the dataset:
Service_Station_Location     object
Latitude                    float64
Longitude                   float64
dtype: object


The cell below can be used if you would like to create a new csv file to store the new data to, otherwise the code cell after will ask for the file you would like to add to.

In [17]:
#new_df.to_csv("Paste location including a file name for your new csv, example: file location/blank.csv", index=False)

#save_location = input("Enter the save location for the blank CSV file: ")

#new_df.to_csv(save_location, index=False)

Run the code cell below, It will prompt you to enter the file location of the CSV file that you would like to add the data to. This is the same process as the first code cell except we are picking the csv file that we want to add on to. It will add the data we collected using the pandas concat function, don't worry about accidently adding duplicates it will handle these!

In [18]:
# Enter the file location for the CSV file to ADD the new data to
from_df_location = input("Enter the file location for the CSV file: ")

# Load CSV to ADD data to
from_df = pd.read_csv(from_df_location)

# Concatenate new data to chosen CSV file
combined_df = pd.concat([from_df, new_df], ignore_index=True)

# Count the number of duplicate rows based on specified columns
num_duplicates_before = combined_df.duplicated(subset=['Service_Station_Location', 'Latitude', 'Longitude'], keep=False).sum()

# Find and print duplicated rows before dropping
duplicated_rows_before = combined_df[combined_df.duplicated(subset=['Service_Station_Location', 'Latitude', 'Longitude'], keep=False)]
print("Duplicated rows before dropping:")
print(duplicated_rows_before)

# Drop duplicates
combined_df.drop_duplicates(subset=['Service_Station_Location', 'Latitude', 'Longitude'], inplace=True)

# Find and print duplicated rows after dropping
duplicated_rows_after = combined_df[combined_df.duplicated(subset=['Service_Station_Location', 'Latitude', 'Longitude'], keep=False)]
print("Duplicated rows after dropping:")
print(duplicated_rows_after)

# Warning! Overwrite the existing dataset with the combined dataset
combined_df.to_csv(from_df_location, index=False)

print(f"Number of duplicates before dropping: {num_duplicates_before}")

Duplicated rows before dropping:
                                Service_Station_Location   Latitude  \
175                     1 Stevens Grove, Lower Hutt 5010 -41.211143   
176                        100 Pah Rd, Auckland 1023, NZ -36.912204   
177                  103 Bridge Street, Karamea 7893, NZ -41.260034   
178          114-124 Jackson St, Petone, Lower Hutt 5012 -41.224706   
179                    120 Hobsonville Rd, Auckland 0618 -36.799100   
...                                                  ...        ...   
88635                             Roberts St, Taupo 3300 -38.690770   
88636     10 Wharf Street, Dunedin Central, Dunedin 9016 -45.880000   
88637           120 Kaiwaka Mangawhai Road, Kaiwaka 0975 -36.160000   
88638                    298 State Highway 1, Bulls 4894 -40.170000   
88639  Cheapa Campa, 575 Memorial Avenue, Burnside, C... -43.480000   

        Longitude  
175    174.904344  
176    174.770478  
177    172.132062  
178    174.871332  
179    174.644

Run the code cell below, We can now check to see the shape of the dataset that we have added more data to. Here we can track the number of station locations indicated by the number of rows! 

In [19]:
#This is now the CSV file with new data added to it
print("This dataset has", combined_df.shape[0], "rows and", combined_df.shape[1], "columns.")

print("Columns & data types in the dataset:")
print(combined_df.dtypes)
print(combined_df.head())

This dataset has 88362 rows and 3 columns.
Columns & data types in the dataset:
Service_Station_Location     object
Latitude                    float64
Longitude                   float64
dtype: object
                            Service_Station_Location   Latitude   Longitude
0  9 Murray Rose,SYDNEY OLYMPIC PARK NSW 2127,AUS... -33.845898  151.069768
1           76 Cowper St,WALLSEND NSW 2287,AUSTRALIA -32.902780  151.669841
2  Hunter Valley Gardens 2090 Broke Road,POKOLBIN... -32.773929  151.293163
3  Cnr Hume Highway & Bessemer Street,MITTAGONG N... -34.449540  150.442926
4              140 Queen St,BERRY NSW 2535,AUSTRALIA -34.775939  150.700542
