# Numpy - New York City Taxi Data Analysis

In these project based exercises selected fundamental to intermediate Numpy operations are applied to do calculations.  The New York City Taxis dataframe is converted into an array to perform more Mathematical-like operations on the data.  

Fundamental Numpy operations and calculations used in these project based exercises are as follows:  
* Selecting and slicing rows and columns, 
* Using the coefficient of variation to determining the disparity of the data
* Changing an image to a NumPy ndarray
* Vector Maths e.g. determining the miles/hour for each trip using vectors (columns) and determining the total amount paid by a client. 

Intermediate operations and calculations in this notebook are as follows:  
* Creating Boolean arrays to aid various calculations e.g. calculating how many trips were made to or from a specific airport, finding the most popular airport and cleaning the array from outliers.
* Assigning specific values for sliced rows and columns (e.g. changing the format of the year column from 2016 to 16 and correcting wrong values in the array). 

The Numpy exercises was done through DataQuest and they also provided the dataset nyc_taxis.csv.  Originally these exercises was done using pure Python, but I converted the code to run in this Jupyter Notebook.  The original Python code is also available in the same file as this notebook.


![nyc_taxi_business.png](attachment:nyc_taxi_business.png)

#### Import Numpy

In [247]:
import pandas as pd
import numpy as np

#### Import the data

In [250]:
# import the data:  nyc_taxi.csv
taxis = pd.read_csv("nyc_taxis.csv")

In [253]:
# Displaying the last 5 rows of the data:

taxis.tail()

Unnamed: 0,pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
2008,2016,6,30,4,5,3,4,9.5,1989,31.0,1.3,5.54,3.0,40.84,1
2009,2016,6,30,4,5,2,4,19.8,2368,52.0,0.8,5.54,0.0,58.34,1
2010,2016,6,30,4,5,2,4,17.48,2822,52.0,0.8,5.54,5.0,63.34,1
2011,2016,6,30,4,5,2,6,12.76,1083,34.5,1.3,0.0,8.95,44.75,1
2012,2016,6,30,4,5,2,0,17.54,1711,48.0,1.3,5.54,0.0,54.84,2


#### Convert the dataframe to a list of lists

In [254]:
# Converting the dataframe into a list of lists

ny_taxis = taxis.values.tolist()

In [255]:
# Viewing the first 3 rows of the list of lists, which is the same as the first entries (rows) in our dataframe. 
ny_taxis[0:3]

[[2016.0,
  1.0,
  1.0,
  5.0,
  0.0,
  2.0,
  4.0,
  21.0,
  2037.0,
  52.0,
  0.8,
  5.54,
  11.65,
  69.99,
  1.0],
 [2016.0,
  1.0,
  1.0,
  5.0,
  0.0,
  2.0,
  1.0,
  16.29,
  1520.0,
  45.0,
  1.3,
  0.0,
  8.0,
  54.3,
  1.0],
 [2016.0,
  1.0,
  1.0,
  5.0,
  0.0,
  2.0,
  6.0,
  12.7,
  1462.0,
  36.5,
  1.3,
  0.0,
  0.0,
  37.8,
  2.0]]

#### Covert the list of lists to an array without the column names

In [256]:
# To be able to use Numpy we will convert the list of lists to an array.

ny_taxis_array1 = np.array(ny_taxis)

In [257]:
# Displaying the shape of our array without column names we have 2013 entries (rows) and 15 columns.

ny_taxis_array1.shape

(2013, 15)

In [258]:
np.set_printoptions(precision=4, suppress = True)

In [259]:
ny_taxis_array = ny_taxis_array1.astype(float)
ny_taxis_array

array([[2016.  ,    1.  ,    1.  , ...,   11.65,   69.99,    1.  ],
       [2016.  ,    1.  ,    1.  , ...,    8.  ,   54.3 ,    1.  ],
       [2016.  ,    1.  ,    1.  , ...,    0.  ,   37.8 ,    2.  ],
       ...,
       [2016.  ,    6.  ,   30.  , ...,    5.  ,   63.34,    1.  ],
       [2016.  ,    6.  ,   30.  , ...,    8.95,   44.75,    1.  ],
       [2016.  ,    6.  ,   30.  , ...,    0.  ,   54.84,    2.  ]])

In [260]:
# Viewing the first 5 entries of the array.  The values are displayed as decimal numbers.

ny_taxis_array[0:5]

array([[2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    4.  ,
          21.  , 2037.  ,   52.  ,    0.8 ,    5.54,   11.65,   69.99,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    1.  ,
          16.29, 1520.  ,   45.  ,    1.3 ,    0.  ,    8.  ,   54.3 ,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
          12.7 , 1462.  ,   36.5 ,    1.3 ,    0.  ,    0.  ,   37.8 ,
           2.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
           8.7 , 1210.  ,   26.  ,    1.3 ,    0.  ,    5.46,   32.76,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
           5.56,  759.  ,   17.5 ,    1.3 ,    0.  ,    0.  ,   18.8 ,
           2.  ]])

### Selecting and slicing rows and columns of ndarrays

#### Select the following rows and columns:

In [261]:
# Selecting and Slicing Rows and Items from ndarrays

# Displaying row 1 (index 0) from our array

row_0 = ny_taxis_array[0]
row_0

array([2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    4.  ,
         21.  , 2037.  ,   52.  ,    0.8 ,    5.54,   11.65,   69.99,
          1.  ])

In [262]:
# Display rows 391 to 500

rows_391_to_500 = ny_taxis_array[391:501]
rows_391_to_500

array([[2016.  ,    1.  ,    2.  , ...,    0.  ,   26.3 ,    2.  ],
       [2016.  ,    1.  ,    2.  , ...,    3.  ,   30.3 ,    1.  ],
       [2016.  ,    1.  ,    2.  , ...,    6.67,   40.01,    1.  ],
       ...,
       [2016.  ,    1.  ,    4.  , ...,    0.  ,   55.34,    2.  ],
       [2016.  ,    1.  ,    4.  , ...,    3.09,   13.39,    1.  ],
       [2016.  ,    1.  ,    4.  , ...,    4.  ,   26.8 ,    1.  ]])

In [131]:
# Displaying the entry in row 21 column 5

row_21_column_5 = ny_taxis_array[21,5]
row_21_column_5

4.0

In [132]:
# Selecting Columns 1,4 and 7:

columns_1_4_7 = ny_taxis_array[0:, [1,4,7]]
columns_1_4_7

array([[ 1.  ,  0.  , 21.  ],
       [ 1.  ,  0.  , 16.29],
       [ 1.  ,  0.  , 12.7 ],
       ...,
       [ 6.  ,  5.  , 17.48],
       [ 6.  ,  5.  , 12.76],
       [ 6.  ,  5.  , 17.54]])

In [263]:
# Selecting columns 5-8 for row 99:

row_99_columns_5_to_8 = ny_taxis_array[99,5:9]
row_99_columns_5_to_8

array([   2.  ,    4.  ,   20.91, 1744.  ])

In [134]:
# Selecting rows 100 - 200 for column 14:

rows_100_to_200_column_14 = ny_taxis_array[100:201, 14]
rows_100_to_200_column_14

array([2., 1., 1., 1., 1., 1., 2., 1., 1., 2., 1., 1., 1., 2., 2., 2., 1.,
       2., 1., 2., 1., 1., 2., 2., 2., 1., 1., 2., 1., 2., 1., 1., 2., 2.,
       1., 1., 2., 2., 1., 1., 1., 2., 1., 1., 1., 2., 2., 2., 2., 2., 1.,
       4., 2., 1., 2., 1., 2., 2., 2., 2., 1., 1., 2., 1., 2., 2., 2., 2.,
       1., 2., 2., 1., 2., 1., 2., 1., 2., 2., 1., 1., 1., 1., 2., 1., 1.,
       2., 2., 1., 1., 2., 2., 1., 1., 2., 1., 1., 1., 1., 1., 2., 2.])

In [264]:
taxis

Unnamed: 0,pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
0,2016,1,1,5,0,2,4,21.00,2037,52.0,0.8,5.54,11.65,69.99,1
1,2016,1,1,5,0,2,1,16.29,1520,45.0,1.3,0.00,8.00,54.30,1
2,2016,1,1,5,0,2,6,12.70,1462,36.5,1.3,0.00,0.00,37.80,2
3,2016,1,1,5,0,2,6,8.70,1210,26.0,1.3,0.00,5.46,32.76,1
4,2016,1,1,5,0,2,6,5.56,759,17.5,1.3,0.00,0.00,18.80,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2008,2016,6,30,4,5,3,4,9.50,1989,31.0,1.3,5.54,3.00,40.84,1
2009,2016,6,30,4,5,2,4,19.80,2368,52.0,0.8,5.54,0.00,58.34,1
2010,2016,6,30,4,5,2,4,17.48,2822,52.0,0.8,5.54,5.00,63.34,1
2011,2016,6,30,4,5,2,6,12.76,1083,34.5,1.3,0.00,8.95,44.75,1


### Vector Maths

#### Select the fare_amount and fees_amount columns:

In [265]:
# Selecting the fare_amount ($) column:

fare_amount = ny_taxis_array[:,9]
fare_amount

array([52. , 45. , 36.5, ..., 52. , 34.5, 48. ])

In [266]:
# Selecting the fees_amount ($) column, index 10:

fees_amount = ny_taxis_array[:,10]
fees_amount

array([0.8, 1.3, 1.3, ..., 0.8, 1.3, 1.3])

#### Calculate the sum of the fare_amounts and fees_amounts using vector operations:

In [267]:
fare_and_fees = fare_amount + fees_amount
fare_and_fees

array([52.8, 46.3, 37.8, ..., 52.8, 35.8, 49.3])

#### Select the trip_distance column:

In [268]:
# Selecting the column containing the distances (in miles):

trip_distance_miles = ny_taxis_array[:,7]
trip_distance_miles                                    

array([21.  , 16.29, 12.7 , ..., 17.48, 12.76, 17.54])

#### Select the trip_length column:

In [269]:
# Selecting the column containg the trip length (in seconds):

trip_length_seconds = ny_taxis_array[:,8]
trip_length_seconds

array([2037., 1520., 1462., ..., 2822., 1083., 1711.])

#### Determine the miles per hour for each trip using vector operations:

In [270]:
# Determining the miles per hour for each trip:

trip_length_hours = trip_length_seconds / 3600  # 3600 seconds is one hour
trip_mph = trip_distance_miles / trip_length_hours
trip_mph

array([37.1134, 38.5816, 31.2722, ..., 22.2991, 42.4155, 36.9047])

#### Determine the minimum, maximum and average miles per hour:

In [345]:
trip_std = trip_mph.std()
trip_std

2790.591571058061

In [343]:
mph_mean = trip_mph.mean()
mph_mean

169.99785562145897

In [None]:
# Since we have inaccuracies in our data, we cannot rely on the mean value of 169,98 m/h.  This is also extremely fast for a
# taxi to drive.

In [346]:
# Finding the coefficient of variation (CV). If CV >= 1, the variation of our data points are high. If CV < 1, the variation of
# our data points are low

coef_var = trip_std / mph_mean
coef_var

16.415451600001255

In [347]:
# It is clear from the above calculation that the variability of our data points are very high.  We can also expect that there
# will be outliers in the data.

In [271]:
# More statistics for the trip_mph series (1D ndarray):

mph_min = trip_mph.min()
mph_max = trip_mph.max()

mph_min

0.0

In [272]:
# From the minimum value of 0 for miles per hour, we can deduce that some of the entered data is not accurate.
# Some entries might have not been filled in and given a 0 instead of an actual value.

In [273]:
mph_max

82800.0

In [274]:
# From the determined maximum miles per hour, we can further clarify that some of the data is not accurate, since we cannot
# have a taxi travelling at 82 800 m/h.  It could be some of the time entries are decimals, which will produce large values like
# this if the number of seconds travelled is a large amount as well. 

In [277]:
taxis.head()

Unnamed: 0,pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
0,2016,1,1,5,0,2,4,21.0,2037,52.0,0.8,5.54,11.65,69.99,1
1,2016,1,1,5,0,2,1,16.29,1520,45.0,1.3,0.0,8.0,54.3,1
2,2016,1,1,5,0,2,6,12.7,1462,36.5,1.3,0.0,0.0,37.8,2
3,2016,1,1,5,0,2,6,8.7,1210,26.0,1.3,0.0,5.46,32.76,1
4,2016,1,1,5,0,2,6,5.56,759,17.5,1.3,0.0,0.0,18.8,2


#### Determine the sum of the fare_amount, fees_amount, tolls_amount and tip_amount columns and verify that the calculations for the total_amount column is correct:
#### Use only the first 5 rows to show your calculations

In [149]:
# Determining if the total_amount has been calcualted correctly

# We'll compare against the first 5 rows only

taxi_first_five = ny_taxis_array[:5]

# Selecting the following columns: fare_amount, fees_amount, tolls_amount, tip_amount
fare_components = taxi_first_five[:,9:13]
fare_sums = fare_components.sum(axis = 1)
fare_totals = taxi_first_five[:, 13]

fare_totals

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

In [150]:
fare_sums

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

In [151]:
# The fare_totals are the same as the far_sums which we calculated for the first 5 rows.  The formula used in the excel worksheet
# seems to be correct, if we use only 5 entries to compare the totals.  This can also be check in Excel to ensure that the same
# formula has been applied to all the rows.

In [278]:
taxis

Unnamed: 0,pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
0,2016,1,1,5,0,2,4,21.00,2037,52.0,0.8,5.54,11.65,69.99,1
1,2016,1,1,5,0,2,1,16.29,1520,45.0,1.3,0.00,8.00,54.30,1
2,2016,1,1,5,0,2,6,12.70,1462,36.5,1.3,0.00,0.00,37.80,2
3,2016,1,1,5,0,2,6,8.70,1210,26.0,1.3,0.00,5.46,32.76,1
4,2016,1,1,5,0,2,6,5.56,759,17.5,1.3,0.00,0.00,18.80,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2008,2016,6,30,4,5,3,4,9.50,1989,31.0,1.3,5.54,3.00,40.84,1
2009,2016,6,30,4,5,2,4,19.80,2368,52.0,0.8,5.54,0.00,58.34,1
2010,2016,6,30,4,5,2,4,17.48,2822,52.0,0.8,5.54,5.00,63.34,1
2011,2016,6,30,4,5,2,6,12.76,1083,34.5,1.3,0.00,8.95,44.75,1


### The next exercises focus on Boolean Indexing with NumPy Arrays

#### Determine the number of taxi rides that takes place in February:

In [279]:
ny_taxis_array

array([[2016.  ,    1.  ,    1.  , ...,   11.65,   69.99,    1.  ],
       [2016.  ,    1.  ,    1.  , ...,    8.  ,   54.3 ,    1.  ],
       [2016.  ,    1.  ,    1.  , ...,    0.  ,   37.8 ,    2.  ],
       ...,
       [2016.  ,    6.  ,   30.  , ...,    5.  ,   63.34,    1.  ],
       [2016.  ,    6.  ,   30.  , ...,    8.95,   44.75,    1.  ],
       [2016.  ,    6.  ,   30.  , ...,    0.  ,   54.84,    2.  ]])

In [280]:
pickup_month = ny_taxis_array[:,1]
february_bool = pickup_month == 2
february = pickup_month[february_bool]
# Find how many rides there were in February.
february_rides = february.shape[0]
february_rides

176

In [281]:
# There were 176 rides in February. 

#### Determine on which trips the client gave a tip of more than 50 dollars.  Only return columns 5-13 for these results.

In [282]:
tip_amount = ny_taxis_array[:,12]
tip_bool = tip_amount > 50
top_tips = ny_taxis_array[tip_bool, 5:14]
top_tips.shape

(1, 9)

In [283]:
top_tips

array([[   4.  ,    2.  ,   21.45, 2004.  ,   52.  ,    0.8 ,    0.  ,
          52.8 ,  105.6 ]])

In [284]:
# From the above calculation we can see that there were only one person who tipped an amount above $50.

#### The value at column index 5 and row index 1066 is incorrect.  Use assignment to change this value to 1.

In [285]:
ny_taxis_array[1066,5] = 1

In [286]:
ny_taxis_array[1066]

array([2016. ,    3. ,    4. ,    5. ,    1. ,    1. ,    3. ,   11.1,
       1899. ,   34. ,    0.8,    0. ,    0. ,   34.8,    2. ])

In [287]:
# Finding the unique values in the year column
year_col = ny_taxis_array[:,0]
np.unique(year_col)

array([2016.])

####  Change the year format in column one from YYYY to YY i.e 2016 will be 16. 

In [288]:
# Since there is only one unique value in the year column, which is the year 2016, we can easily replace
# the YYYY (2016) format with the YY format.
ny_taxis_array[:,0] = 16

#### The values at column index 7 for rows 550 and 551 are incorrect.  Change these values to the mean value of the column. 

In [289]:
ny_taxis_array[500:502, 7] = ny_taxis_array[:,7].mean()
ny_taxis_array[500:502, 7]

array([12.9248, 12.9248])

#### In column 14 (index 13), which is the total_amount column, if there are any values < 0, change these entries to 0.

In [290]:
total_amount = ny_taxis_array[:,13]
neg_values = total_amount < 0
ny_taxis_array[neg_values] = 0

In [291]:
np.sum(neg_values)

0

In [292]:
# From the last calculation we can see that there were no negative values in the total_amount column.

#### We want to add a new column to our array filled with zeros.  This column will be used for comparison:

In [334]:
zeros = np.zeros([ny_taxis_array.shape[0], 1])
new_taxis = np.concatenate([ny_taxis_array, zeros], axis = 1)
new_taxis

array([[16.  ,  1.  ,  1.  , ..., 69.99,  1.  ,  0.  ],
       [16.  ,  1.  ,  1.  , ..., 54.3 ,  1.  ,  0.  ],
       [16.  ,  1.  ,  1.  , ..., 37.8 ,  2.  ,  0.  ],
       ...,
       [16.  ,  6.  , 30.  , ..., 63.34,  1.  ,  0.  ],
       [16.  ,  6.  , 30.  , ..., 44.75,  1.  ,  0.  ],
       [16.  ,  6.  , 30.  , ..., 54.84,  2.  ,  0.  ]])

In [335]:
taxis

Unnamed: 0,pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
0,2016,1,1,5,0,2,4,21.00,2037,52.0,0.8,5.54,11.65,69.99,1
1,2016,1,1,5,0,2,1,16.29,1520,45.0,1.3,0.00,8.00,54.30,1
2,2016,1,1,5,0,2,6,12.70,1462,36.5,1.3,0.00,0.00,37.80,2
3,2016,1,1,5,0,2,6,8.70,1210,26.0,1.3,0.00,5.46,32.76,1
4,2016,1,1,5,0,2,6,5.56,759,17.5,1.3,0.00,0.00,18.80,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2008,2016,6,30,4,5,3,4,9.50,1989,31.0,1.3,5.54,3.00,40.84,1
2009,2016,6,30,4,5,2,4,19.80,2368,52.0,0.8,5.54,0.00,58.34,1
2010,2016,6,30,4,5,2,4,17.48,2822,52.0,0.8,5.54,5.00,63.34,1
2011,2016,6,30,4,5,2,6,12.76,1083,34.5,1.3,0.00,8.95,44.75,1


#### If the pickup location is an airport, we will assign a  value of 1 in the new column for the corresopnding row (entry).  We use boolean arrays to do this:

In [295]:
jkf_Airport = new_taxis[: , 5] == 2
new_taxis[jkf_Airport , 15] = 1
la_Airport = new_taxis[: , 5] == 3
new_taxis[la_Airport , 15] = 1
new_Airport = new_taxis[: , 5] == 5
new_taxis[new_Airport , 15] = 1

#### Next we will determine which airport is the most popular airport.  We compare the number of trips to and from each airport with each other using boolean arrays:

In [298]:
jfkdrop_off_location = new_taxis[: , 6] == 2
jfk = new_taxis[jfkdrop_off_location]
jfk_count = np.shape(jfk)[0]
jfk_count

285

In [300]:
ladrop_off_location = new_taxis[: , 6] == 3
laguardia = new_taxis[ladrop_off_location]
laguardia_count = np.shape(laguardia)[0]
laguardia_count

308

In [302]:
newdrop_off_location = new_taxis[: , 6] == 5
newark = new_taxis[newdrop_off_location]
newark_count = np.shape(newark)[0]
newark_count

2

In [303]:
# Laguardia Airport is the most popular airport.

In [336]:
trip_mph = new_taxis[:,7] / (new_taxis[:,8] / 3600)
cleaned_taxi = new_taxis[trip_mph < 100]
mean_distance = cleaned_taxi[:, 7].mean()
mean_length = cleaned_taxi[: , 8].mean()
mean_total_amount = cleaned_taxi[: , 13].mean()
trip_mph2 = cleaned_taxi[:, 7] / (cleaned_taxi[:,8] / 3600)
mean_mph_new = trip_mph2.mean()

In [309]:
cleaned_taxi.shape

(2004, 16)

In [None]:
# The new array, new_taxis, contains only rows (entries) with trip_distances and trip_lengths which gives a miles per hour of
# less than 100.  Viewing the shape of this new array we see that it contains 2004 rows, and thus there were 9 entries that had
# a miles per hour of 100 or greater.
# Having the data cleaned from inaccurate entries, we are now able to, determine the mean miles per hour for all the entries with
# better accuracy.

In [337]:
mean_mph_new

24.455760628571785

In [338]:
# After dropping rows with inaccuarcies, we see that the new miles per hour value is 24,46 mph.  This is much more realistic than 
# the 169,9 mph we obtained earlier.

In [305]:
# Calculating the mean distance that clients travel:

mean_distance

12.907449879178554

In [306]:
# Calculating the mean length of time (in seconds) for a trip:

mean_length

2271.691117764471

In [339]:
# The mean total amount that a client pays per trip for taxi services is displyed:

mean_total_amount

48.70217065868263

In [348]:
# Calculating the standard deviation for the miles per hour vector from the cleaned_taxi array:

std_mph_new = trip_mph2.std()
std_mph_new

8.812015595706029

In [350]:
# Calculating the coefficient of variation (CV) for the miles per hour for the cleaned_taxi array:

coef_var2 = std_mph_new / mean_mph_new
coef_var2

0.36032474023363265

In [None]:
# From the last calculation we can see that, after outliers were eliminated from our data (the array), the coefficient of
# variation decreased drastically.  Our new CV = 0.36 and before the data was cleaned from outliers the CV = 16,42.
# We can deduce that the variability of our data for the new array, cleaned_taxi, is low.

### The future of taxi services

#### Many companies are working on flying taxis which are known as passenger drones or electrical vertical take-off and landing (eVTOL) aircrafts.  Many hours are spent in traffic, especially in cities that are densely populated like New York, Los Angeles and London.  EHang AAV is a two-seater drone that is being developed in China, the British company Vertical Aerospace released an eVTOL prototype that completed its first flight in 2019, VoloPort from Germany built its first air taxi 2X eVTOL aircraft, which can fly 22 miles at a top speed of 68 mph, and completed its first test flight in Singapore in 2019.  Volocoports' aim is to provide a market for flying taxis for short distances above the bsiness of the city streets.  It is their expectation to start commercial flights in 2022.  The German startup Lilium has been doing tests on their five-seater Lilium Jet and hopes to launch their pasanger services accross many location by 2025.  Uber has also been working on their own flying taxis and hopes to launch their first piloted services by 2023 and autonomous, pilotless aircrafts by 2030.

#### Reference ariticle: Introducing The Mindboggling Flying Taxis Of The Future by Bernard Marr

![Flying_Taxis_Future.png](attachment:Flying_Taxis_Future.png)

#### For the sake of interest I will change this image to an array of values:

In [321]:
from matplotlib.image import imread

flying_taxi = imread('Flying_Taxis_Future.png')
type(flying_taxi)

numpy.ndarray

In [None]:
# From the above output we see that this image is an array.

In [322]:
flying_taxi.shape

(667, 1000, 4)

In [325]:
# flying_taxi is a 3D array.  It consists of 667 rows, 1000 columns and 4 arrays inside of the big array called flying_taxis.

In [323]:
flying_taxi

array([[[0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        ...,
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ]],

       [[0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        ...,
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ]],

       [[0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        [0.6549, 0.5725, 0.4902, 1.    ],
        ...,
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ],
        [0.0549, 0.0588, 0.0588, 1.    ]],

       ...,

       [[0.498 , 0.4902, 0.4863, 1.    ],
        [0.498 , 0.4902, 0.4863, 1.    ],
        [0.498 , 0.4902, 0.4863, 1.    ],
        ...,
        [0.3176, 0.3216, 0.3294, 1.    ],
     

In [331]:
# Displying array 1 in the big array.  For this array row 450-459 and the corresponding columns from 100-120 is displayed below:

flying_taxi[450:460,100:121,1]

array([[0.7961, 0.8118, 0.7961, 0.7961, 0.7961, 0.7961, 0.7961, 0.7961,
        0.8118, 0.7961, 0.7961, 0.8118, 0.7961, 0.7961, 0.8118, 0.7961,
        0.8118, 0.7961, 0.7961, 0.8118, 0.7961],
       [0.7961, 0.7961, 0.7961, 0.7961, 0.7961, 0.7961, 0.7765, 0.7961,
        0.7765, 0.7961, 0.7961, 0.7961, 0.7765, 0.7961, 0.7961, 0.7765,
        0.7961, 0.7765, 0.7765, 0.7961, 0.7765],
       [0.7686, 0.7765, 0.7686, 0.7765, 0.7686, 0.7765, 0.7686, 0.7686,
        0.749 , 0.7765, 0.7686, 0.7765, 0.7765, 0.7765, 0.749 , 0.7686,
        0.7765, 0.7686, 0.7765, 0.749 , 0.7686],
       [0.7412, 0.749 , 0.749 , 0.749 , 0.7412, 0.749 , 0.749 , 0.7412,
        0.749 , 0.7412, 0.749 , 0.7412, 0.749 , 0.7412, 0.749 , 0.749 ,
        0.749 , 0.749 , 0.749 , 0.749 , 0.749 ],
       [0.7216, 0.7216, 0.749 , 0.7216, 0.749 , 0.7216, 0.7216, 0.7216,
        0.7216, 0.7216, 0.7216, 0.7216, 0.7216, 0.749 , 0.7216, 0.7216,
        0.7216, 0.7216, 0.7216, 0.7216, 0.7216],
       [0.7216, 0.7216, 0.7216, 0.7

### Thank you for viewing my notebook on NumPy.  I hope you enjoyed it!