# Analyze Electric Vehicle Stations

## Task 1

- Import the CSV file `stations.csv` and assign it to the variable `stations`
- Display the first five (5) rows for the DataFrame

In [2]:
import pandas as pd
stations = pd.read_csv('https://raw.githubusercontent.com/cogxen/datasets/main/for-laboratory/exploring-data-with-python/sorting-and-filtering/stations.csv')
stations.head()

Unnamed: 0,fuel,state,owner,access,number_of_stations
0,biodiesel,AL,government,private,8
1,biodiesel,AL,private,public,2
2,biodiesel,AR,government,private,1
3,biodiesel,AR,private,public,16
4,biodiesel,AZ,government,private,74


## Task 2

When we start working with a new dataset, it's a good idea of the following:
- Get some summaries of the different columns, so that we know what kinds of values they contain
- Display the different kinds of included in the dataset and their counts

In [3]:
fuel_counts = stations['fuel'].value_counts()
fuel_counts

fuel
electric                  260
compressed natural gas    210
propane                   105
ethanol                    97
biodiesel                  70
liquefied natural gas      50
hydrogen                   27
Name: count, dtype: int64

## Task 3

- Display the different kinds of `owners` included in the dataset and their counts

In [4]:
owner_counts = stations['owner'].value_counts()
owner_counts

owner
private            413
government         288
utility company    108
joint               10
Name: count, dtype: int64

## Task 4

- Display the different kinds of `accesses` included in the dataset and their counts

In [5]:
access_counts = stations['access'].value_counts()
access_counts

access
public     415
private    404
Name: count, dtype: int64

## Task 5

- Display the statistical summary of the `number_of_stations`

In [6]:
number_of_stations_description = stations['number_of_stations'].describe()
number_of_stations_description

count     819.000000
mean       34.741148
std       118.665095
min         1.000000
25%         2.000000
50%         5.000000
75%        20.500000
max      2423.000000
Name: number_of_stations, dtype: float64

## Task 6 

There's a pretty large maximum in the output to **Task 5**. 

- Sort the `stations` by the `number_of_stations` from the largest to smallest
- Display the DataFrame

In [7]:
sort_by_number_of_stations_desc = stations.sort_values(by='number_of_stations', ascending=False)
sort_by_number_of_stations_desc.head()

Unnamed: 0,fuel,state,owner,access,number_of_stations
301,electric,CA,private,public,2423
456,electric,NY,private,public,1173
500,electric,TX,private,public,778
329,electric,FL,private,public,659
510,electric,VA,private,public,497


## Task 7

- Sort the `stations` by the `number_of_stations` from the smallest to largest
- Display the top five (5) of the DataFrame

In [8]:
sort_by_number_of_stations_asc = stations.sort_values(by='number_of_stations', ascending=True)
sort_by_number_of_stations_asc.head()

Unnamed: 0,fuel,state,owner,access,number_of_stations
194,compressed natural gas,NJ,joint,public,1
203,compressed natural gas,NV,government,private,1
432,electric,NH,government,private,1
208,compressed natural gas,NY,joint,public,1
730,propane,CT,private,private,1


## Task 8  

Electric vehicles have become more and more crucial to plans around climate change. Let's take a closer look at stations that anyone can access. 

- Create a booleam mask that is `True` for any row on `stations` where `access` is public

NOTE: Just create a **boolean mask**, no need to display anything.

In [9]:
access_public = stations['access'] == 'public'
stations[access_public]

Unnamed: 0,fuel,state,owner,access,number_of_stations
1,biodiesel,AL,private,public,2
3,biodiesel,AR,private,public,16
5,biodiesel,AZ,private,public,2
7,biodiesel,CA,private,public,23
9,biodiesel,CO,private,public,3
...,...,...,...,...,...
811,propane,WA,private,public,74
812,propane,WA,utility company,public,1
815,propane,WI,private,public,53
817,propane,WV,private,public,10


## Task 9

- Create a boolean mask that is `True` for any row of `stations` where `fuel` is `electric`

NOTE: Just create a **boolean mask**, no need to display anything.

In [10]:
fuel_electric = stations['fuel'] == 'electric'
stations[fuel_electric]

Unnamed: 0,fuel,state,owner,access,number_of_stations
280,electric,AK,government,private,3
281,electric,AK,government,public,6
282,electric,AK,private,public,37
283,electric,AK,utility company,public,3
284,electric,AL,government,private,15
...,...,...,...,...,...
535,electric,WV,private,public,77
536,electric,WY,government,private,9
537,electric,WY,government,public,8
538,electric,WY,private,private,1


## Task 10

Use the boolean masks from **Task 8** and **Task 9** to filter `stations` down to only rows that are both public-access and electric-fuel.

- Assign the result to variable `public_electric`
- Display the DataFrame 

In [11]:
public_electric = stations[access_public & fuel_electric]
public_electric

Unnamed: 0,fuel,state,owner,access,number_of_stations
281,electric,AK,government,public,6
282,electric,AK,private,public,37
283,electric,AK,utility company,public,3
285,electric,AL,government,public,1
287,electric,AL,private,public,102
...,...,...,...,...,...
531,electric,WI,utility company,public,1
533,electric,WV,government,public,11
535,electric,WV,private,public,77
537,electric,WY,government,public,8


## Task 11

- Sort `public_electric` by `number_of_stations` from smallest to largest
- Display the top five (5) rows (corresponding to the smallest numbers of stations) 

In [12]:
public_electric.sort_values(by='number_of_stations', ascending=True)

Unnamed: 0,fuel,state,owner,access,number_of_stations
454,electric,NY,joint,public,1
473,electric,OR,utility company,public,1
317,electric,DC,government,public,1
458,electric,NY,utility company,public,1
479,electric,RI,government,public,1
...,...,...,...,...,...
510,electric,VA,private,public,497
329,electric,FL,private,public,659
500,electric,TX,private,public,778
456,electric,NY,private,public,1173


## Task 12

While we can't be sure that this trend continues, it certainly looks as if publicly-owned (government/utility) stations are less common than privately owned (which we saw in Task 6). Let's now compare privately- and publicly-owned stations. 

- Create a Boolean mask that is `True` for each row of `public_electric` where `owner` is private

NOTE: Just create a **boolean mask**, no need to display anything.

In [13]:
owner_is_private = public_electric['owner'] == 'private'

## Task 13

- Apply the Boolean mask from **Task 12** to filter `public_electric` down to only privately-owned rows
- Assign the result to the variable `privately_owned`
- Display the DataFrame

In [14]:
privately_owned = public_electric[owner_is_private]
privately_owned

Unnamed: 0,fuel,state,owner,access,number_of_stations
282,electric,AK,private,public,37
287,electric,AL,private,public,102
292,electric,AR,private,public,70
297,electric,AZ,private,public,294
301,electric,CA,private,public,2423
307,electric,CO,private,public,277
313,electric,CT,private,public,218
319,electric,DC,private,public,62
325,electric,DE,private,public,30
329,electric,FL,private,public,659


## Task 14

Let's check how many states have privately-owned, publicly-accessible electric charging stations.

- Display the statistical summary on the `state` column of `privately_owned`

In [15]:
privately_owned['state'].describe()

count     51
unique    51
top       AK
freq       1
Name: state, dtype: object

## Task 15

Let's compare this to the publicly-owned stations.

- Apply the Boolean mask you created in **Task 12** to filter `public_electric` down to only rows with non-private ownership
- Assign the result to the variable `not_privately_owned`
- Display the DataFrame

In [16]:
not_privately_owned = public_electric[~owner_is_private]
not_privately_owned

Unnamed: 0,fuel,state,owner,access,number_of_stations
281,electric,AK,government,public,6
283,electric,AK,utility company,public,3
285,electric,AL,government,public,1
290,electric,AR,government,public,8
293,electric,AR,utility company,public,1
...,...,...,...,...,...
525,electric,WA,utility company,public,4
527,electric,WI,government,public,5
531,electric,WI,utility company,public,1
533,electric,WV,government,public,11


## Task 16

Let's check how many states have publicly-owned, publicly-accessible electric charging stations.

- Display the statistical summary on the `state` column of `not_privately_owned`

In [17]:
not_privately_owned['state'].describe()

count     83
unique    49
top       NY
freq       3
Name: state, dtype: object

## Task 17

Let's investigate publicly-owned stations a bit further.

- Display the statistical summary of `number_of_stations` column of `not_privately_owned`

In [18]:
not_privately_owned['number_of_stations'].describe()

count     83.000000
mean      17.927711
std       44.294409
min        1.000000
25%        1.000000
50%        6.000000
75%       17.000000
max      361.000000
Name: number_of_stations, dtype: float64

## Task 18

The maximum number of stations is quite a bit larger than the 75th percentile. Let's look at the rows between these values.

- Create a boolean mask that is `True` in each row of `not_privately_owned`  where `number_of_stations` is bigger than $17$ or the $75th$ percentile
- Assign the result to the variable `number_of_stations_75`
- Display the result

In [19]:
number_of_stations_75 = not_privately_owned['number_of_stations'] > 17
number_of_stations_75

281    False
283    False
285    False
290    False
293    False
       ...  
525    False
527    False
531    False
533    False
537    False
Name: number_of_stations, Length: 83, dtype: bool

## Task 19

- Filter `not_privately_owned` down to only rows with `number_of_stations` bigger than $17$
- Assign the result to the variable `above_17`
- Disply the DataFrame

In [20]:
above_17 = not_privately_owned[number_of_stations_75]
above_17

Unnamed: 0,fuel,state,owner,access,number_of_stations
299,electric,CA,government,public,361
303,electric,CA,utility company,public,40
305,electric,CO,government,public,59
311,electric,CT,government,public,106
327,electric,FL,government,public,32
331,electric,FL,utility company,public,31
367,electric,KS,utility company,public,21
378,electric,MA,government,public,22
384,electric,MD,government,public,30
390,electric,ME,government,public,31


## Task 20

-  Sort `above_17` by `number_of_stations`
-  Display the entirety of the result

In [21]:
above_17.sort_values(by='number_of_stations')

Unnamed: 0,fuel,state,owner,access,number_of_stations
423,electric,NC,utility company,public,18
521,electric,WA,government,public,19
398,electric,MI,utility company,public,19
508,electric,VA,government,public,19
367,electric,KS,utility company,public,21
378,electric,MA,government,public,22
394,electric,MI,government,public,22
438,electric,NJ,government,public,26
384,electric,MD,government,public,30
390,electric,ME,government,public,31
