# README
---

## For the reader

<h4> Note that the following boxes appear in this notebook: </h4>

<div class="alert alert-block alert-success"><b>Green boxes:</b> Comments about results. </div>

<div class="alert alert-block alert-warning"><b>Yellow boxes:</b> Notes and warnings. </div>

<div class="alert alert-block alert-danger"><b>Red boxes:</b> Places you need to alter the code if you want to run it for yourself. </div>

## Brief Overview

<ul>
    <li><a href="#importing-necessary-libraries">1. Importing Necessary Libraries</a></li>
    <li><a href="#data-import-and-preparation">2. Data Import and Preparation</a>
        <ul>
            <li>2.1 Import Data</li>
                <ul>
                <li>Option 1: Importing Data from SQL</a></li>
                <li>Option 2: Importing Data from CSV</a></li>
                </ul>
            <li>2.2 Setting and Checking Column Data Types</a>
                <ul>
                    <li>2.3.1 Assigning Data Types</a></li>
                    <li>2.3.2 Verifying Data Types</a></li>
                </ul>
            </li>
        </ul>
    </li>
    <li><a href="#exploratory-data-analysis">3. Exploratory Data Analysis</a>
        <ul>
            <li><a href="#descriptive-statistics">3.1 Descriptive Statistics</a></li>
            <li><a href="#data-visualization">3.2 Data Visualization</a>
                <ul>
                    <li><a href="#about-number-of-rides">3.2.1 About Number of Rides</a>
                        <ul>
                            <li>A. Pie Chart: Number of Rides (Members vs Casuals)</a></li>
                            <li>B. Weekly Ride Distribution (Members vs Casuals)</a></li>
                            <li>C. Monthly Ride Distribution (Members vs Casuals)</a></li>
                        </ul>
                    </li>
                    <li><a href="#about-ride-duration">3.2.2 About Ride Duration</a>
                        <ul>
                            <li>A. Weekly Ride Duration (Members vs Casuals)</a></li>
                            <li>B. Monthly Ride Duration (Members vs Casuals)</a></li>
                        </ul>
                    </li>
                    <li><a href="#about-rideable-type">3.2.3 About Rideable Type</a>
                        <ul>
                            <li>A. Bar Chart: Rideable Type (Electric vs Classic Bikes)</a></li>
                            <li>B. Stacked Bar Chart: Rideable Types for Members vs Casuals</a></li>
                        </ul>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
</ul>


# 1. Importing Necessary Libraries <a id="importing-necessary-libraries"></a>
---

# 2. Data Import and Preparation <a id="data-import-and-preparation"></a>
---

## 2.1 Import
---

### Option 1: Importing Data from SQL

### Option 2: Importing Data from CSV

## 2.2 Setting and Checking Column Data Types
---

### 2.2.1 Assigning Data Types

### 2.2.2 Verifying Data Types

# 3. Exploratory Data Analysis <a id="exploratory-data-analysis"></a>
---

## 3.1 Descriptive Statistics <a id="descriptive-statistics"></a>
---

## 3.2 Data Visualization <a id="data-visualization"></a>
---

### 3.2.1 About Number of Rides <a id="about-number-of-rides"></a>
---

#### A. Pie Chart: Number of Rides (Members vs Casuals)

#### B. Weekly Ride Distribution (Members vs Casuals)

#### C. Monthly Ride Distribution (Members vs Casuals)

### 3.2.2 About Ride Duration <a id="about-ride-duration"></a>
---

#### A. Weekly Ride Duration (Members vs Casuals)

#### B. Monthly Ride Duration (Members vs Casuals)

### 3.2.3 About Rideable Type <a id="about-rideable-type"></a>
---

#### A. Bar Chart: Rideable Type (Electric vs Classic Bikes)

#### B. Stacked Bar Chart: Rideable Types for Members vs Casuals

In [2]:
import os # for file directories
import numpy as np 
import pandas as pd

In [14]:
# Directory containing the CSV files
directory = r'C:\0.Sync\Coding\Data\Projects\CyclisticBike-Share'

# List all files in the directory, then pick only .csv files
files = os.listdir(directory)
csv_files = [file for file in files if file.endswith('.csv')]

# Create empty list to store DataFrames
dfs = []

# Iterate over each CSV file and read it into a DataFrame
for file in csv_files:
    file_path = os.path.join(directory, file)
    df_next = pd.read_csv(file_path)
    dfs.append(df_next)

# Concatenate all DataFrames into a single DataFrame
df = pd.concat(dfs, ignore_index=True)

# Now combined_df contains all the data from all CSV files in the directory


In [7]:
columns_with_missing_values = df.columns[df.isna().any()].tolist()

# Print the columns with missing values
print(columns_with_missing_values)

['start_station_name', 'start_station_id', 'end_station_name', 'end_station_id', 'end_lat', 'end_lng']


In [8]:
import pandas as pd

# Assuming df is your DataFrame
# Check for missing values in each column
missing_values_counts = df.isna().sum()

# Filter out columns with missing values
columns_with_missing_values = missing_values_counts[missing_values_counts > 0]

# Print the counts of missing values for each column
print(columns_with_missing_values)

start_station_name    875716
start_station_id      875848
end_station_name      929202
end_station_id        929343
end_lat                 6990
end_lng                 6990
dtype: int64
