# Stock price of ride-sharing companies

As of today, the four companies that are running the ride-sharing business in New York City are Uber, Lyft, Via, and Juno. Via is a private company, so we will not be able to get the stock data. Juno was acquired by Gett in 2017, so we will not be able to get the stock data for Juno either.

In this notebook, we will download the stock data of Uber and Lyft from yahoo finance.

## 1. Download the Uber & Lyft stock data from yahoo finance

In [20]:
import yfinance as yf

# Download uber stock data
uber_stock = yf.download("UBER", start="2019-01-01", end="2023-12-31")
uber_stock.to_csv('./data/uber_stock.csv')

# Download lyft stock data
lyft_stock = yf.download("LYFT", start="2019-01-01", end="2023-12-31")
lyft_stock.to_csv('./data/lyft_stock.csv')

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


## 2. Data Transformation

The stock price data does not contains the price data for weekends and holidays. We need to fill in the missing dates and prices. We will use the previous day's price to fill in the missing data.

In [33]:
import pandas as pd
import datetime

def process_stock_data(file_path):
    # Load the CSV file into a DataFrame
    df = pd.read_csv(file_path)
    df['Date'] = pd.to_datetime(df['Date'])

    # Initialize an empty list to store the rows
    rows = []

    # Iterate through the DataFrame
    for i in range(len(df) - 1):
        # Add the current row to the list with year, month, and day
        current_row = df.iloc[i].copy()
        current_row['Year'] = current_row['Date'].year
        current_row['Month'] = current_row['Date'].month
        current_row['Day'] = current_row['Date'].day
        rows.append(current_row)

        # Calculate the difference in days between the current row and the next row
        days_diff = (df.loc[i + 1, 'Date'] - df.loc[i, 'Date']).days

        # If there's more than 1 day difference, fill in the missing days
        if days_diff > 1:
            for j in range(1, days_diff):
                # Create a new row with the previous day's data but with volume as 0
                new_row = df.iloc[i].copy()
                new_date = df.iloc[i]['Date'] + datetime.timedelta(days=j)
                new_row['Date'] = new_date
                new_row['Year'] = new_date.year
                new_row['Month'] = new_date.month
                new_row['Day'] = new_date.day
                new_row['Volume'] = 0
                rows.append(new_row)

    # Add the last row of the original DataFrame
    last_row = df.iloc[-1].copy()
    last_row['Year'] = last_row['Date'].year
    last_row['Month'] = last_row['Date'].month
    last_row['Day'] = last_row['Date'].day
    rows.append(last_row)

    # Convert the list of rows into a DataFrame
    updated_df = pd.DataFrame(rows)

    # Round all numeric columns to 2 decimal places
    numeric_cols = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
    updated_df[numeric_cols] = updated_df[numeric_cols].round(2)

    return updated_df


In [34]:

uber_file_path = './data/uber_stock.csv'
uber_processed_data = process_stock_data(uber_file_path)
uber_processed_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day
0,2019-05-10,42.0,45.0,41.06,41.57,41.57,186322500,2019,5,10
0,2019-05-11,42.0,45.0,41.06,41.57,41.57,0,2019,5,11
0,2019-05-12,42.0,45.0,41.06,41.57,41.57,0,2019,5,12
1,2019-05-13,38.79,39.24,36.08,37.1,37.1,79442400,2019,5,13
2,2019-05-14,38.31,39.96,36.85,39.96,39.96,46661100,2019,5,14


In [35]:

# output the csv file to data folder
uber_processed_data.to_csv('./data/stock_uber.csv', index=False)

In [36]:
lyft_file_path = './data/lyft_stock.csv'
lyft_processed_data = process_stock_data(lyft_file_path)
lyft_processed_data.head()


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Year,Month,Day
0,2019-03-29,87.33,88.6,78.02,78.29,78.29,71485200,2019,3,29
0,2019-03-30,87.33,88.6,78.02,78.29,78.29,0,2019,3,30
0,2019-03-31,87.33,88.6,78.02,78.29,78.29,0,2019,3,31
1,2019-04-01,74.9,75.0,67.78,69.01,69.01,41799300,2019,4,1
2,2019-04-02,66.9,70.2,66.1,68.97,68.97,22483300,2019,4,2


In [37]:
# output the csv file to data folder
lyft_processed_data.to_csv('./data/stock_lyft.csv', index=False)


## Next

Upload the data to the Amazon S3 bucket.

```shell
aws s3 cp stock_uber.csv s3://qiaoshi-aws-ml/tlc/stock/rider=uber/stock_uber.csv
aws s3 cp stock_lyft.csv s3://qiaoshi-aws-ml/tlc/stock/rider=lyft/stock_lyft.csv
```

