Skip to content

Examples of Python used to conduct Instacart Grocery Basket Analysis in a study project

Notifications You must be signed in to change notification settings

Senja-P/Python-Grocery-Basket

Repository files navigation

Python-Grocery-Basket

Project Overview

  • Instacart in an online grocery store that operates through an app. In this study project, the marketing and sales teams wanted to get a better understanding of the customers and their purchasing behaviours in order to plan targeted marketing strategies.

  • My role as a Data Analyst was to answer the following key questions:

    a) Who are the customers?

    What are the customer demographics? Are there differences in ordering habits based on a customer’s loyalty status and regions?

    b) What do they buy?

    Are certain types of products more popular, are there differences in customers’ behaviour?

    c) When do they shop?

    What are the busiest days of the week and hours of the day, and are there any particular times when customers spend most money?

  • Customer behaviours were analysed against their loyalty status and household types. The age and dependant status variables were used to form the household types. The loyalty categories were based on the number of orders. The US states were divided into four geographical regions; Northeast, Midwest, South and West.

  • The households with children placed more online orders in each loyalty group than those without children. Almost 50% of all orders were placed by customers over the age of 40. The most orders were placed in South, followed by West and Midwest, but there were no significant differences in regional level when order percentages were compared against customer numbers in each region. The most popular product types were produce, dairy & eggs and snacks. The weekends were the busiest days. The orders were declining from 5pm but increasing from 10pm, therefore my recommendation was to increase marketing between 5pm and 9pm.

  • The copy of the Power Point Presentation uploaded above

Analytics

  • Descriptive analysis, the purpose was to use the past data for marketing purposes.
  • Installed libraries and importing dataframes
  • Completed consistency checks; missing values and duplicates
  • Data wrangling, e.g. dropping and renaming columns
  • Merged dataframes
  • Grouped, aggregated and created new variables to answer the business questions
  • Created visuals and exported dataframes
  • Created a population flow
  • The copy of the analysis steps uploaded above

Technical skills utilised

  • Tools and libraries used; Jupyter, pandas, NumPy, os, matplotlib, seaborn, scipy. Examples of used Python scripts available above
  • Excel, Power Point and Python used for visualisations. Power Point Presentation available above

Data

  • Size: 35 millions rows
  • Data Sets: products, orders and customers. The customer data set and prices column in product data set were fabricated for the training purposes by CareerFoundry.
  • Year: 2017

Data Source

Return to the main portfolio page