The dataset, 2017-fordgobike-tripdata.csv, is downloaded from following link and licensed by Ford GoBike. This dataset includes 519,700 trips with 15 features such as locations, time, and user attributes for bike_sharing system. Bike-sharing is gaining popularity, since it provides a convenient solution for short distance transportation, especially where the typical public transportation systems could not cover thoroughly. Riders can pick up a bike from one station and drop off at any other in the network.This system has two type of users: 1.subscribers, and 2. customers.
This document explores this dataset containing information about individual rides of Ford GoBike sharing system covering in 2017. Before analysis I try to clean this dataset in a df_clean dataframe.My goal is to conduct an exploratory and explanatory data analysis on Ford GoBike Trip dataset. We will use Python data science and data visualization libraries to explore the dataset’s variables and understand the data’s structure, oddities, patterns and relationships. The analysis in this part is structured, going from simple univariate relationships up through multivariate relationships, but it does not need to be clean or perfect. Dataset address is : https://s3.amazonaws.com/fordgobike-data/2017-fordgobike-tripdata.csv
Univariate exploration: There are more trips in the morning and afternoon than the night. Also, there are more trips during the weekdays and less trips during the weekends. There are more subscribers than customers users. The number of trips in male riders is 3 times more than the number of trips in females. Most of riders are 30 to 40 years old and the duration of trips is around 650 seconds.
Bivariate exploration: Weekdays have the most trips than weekends. Most trips took place during Oct when the weather is good enough for riders to ride. The number of trips gradually increases from August to October and it decreases in the November and December. Biking is definitely correlated with the weather. Subscribers have much more number of trips than customers regardless of their gender, period of the day or day of the week.Subscribers have more median age than customers.Member's age and duration of trips are negatively correlated but this negative correlation is not strong.Median of member age for 4 top more frequent start station are almost the same (about 36 years old)except San Francisco Ferry Building(Harry Bridges Plaza) that has greater median age than others(about 42 years old).Average value of member age for Male is almost 37.90 which is greater than female with average value of 35.1 and for other gender is 36.25.Among all riders, Number of male subscriber is the greatest and number of female customer is the smallest .Subscribers group with rang age of 30-40 years old has the greatest percentage of the total trips(35.86%) and custumers group with rang age of 70-80 years old has the least percentage of the total trips(0.02) .So we got a feel for the user behavior across gender, age, and their subscription status. Most of the riders in this system start their trip at of 8:00 and 17:00 .Most tips usually took place between 7:00 to 17:00. Less trip took place during night.
Multivariate exploration: separating user types, customers and subscribers, gives more insights. Customers like to bike during the weekend and in the summer.On the other hand, the trips in subscribers increase during the weekdays and after launching, the number of trips gradually increases and then decreases when the weather becomes colder. Moreover, customers ride longer than subscribers. Both of them have the longest trips at night . I am not sure about why night trips are longer so still need more data to investigate more.the trip duration for Customers is greater than Subscribers.The trip duration for female is greater than male and other and for males is less than female and other. we can also see that subscribers have much more trips than customers. Male subscribers have maximum mean of age and female customers have minimum mean of age.Trip duration during the weekend is usually greater than the weekdays. I think it is because the riders usually use this system on weekends for vacation and visiting city with no rush so the time duration was increased for them. Most of the riders in this system start their trip at 8:00 and 17:00 during weekdays.Most tips usually took place between 7:00 to 17:00. Less trip took place during night. During weekend most of the trips start at noon