# Data Wrangling & Subsetting

# Contents
1. Import libraries
2. Import datasets
3. Check imported data
4. Data wrangling
5. Data dictionaries
6. Create subsets using filters
7. Export dataframes

# 1. Import libraries

In [6]:
#Import libraries
import pandas as pd
import numpy as np
import os

# 2. Import datasets

In [8]:
#Define the project path
path = r'/Users/davidgriesel/Documents/0 - CareerFoundry/02 - Data Analytics Immersion/04 - Python Fundamentals for Data Analysts/Instacart Basket Analysis - IC 202409'

## 2.1 Orders

In [10]:
#Import 'orders' dataset
df_orders = pd.read_csv(os.path.join(path, '02 - Data', 'Original Data', 'orders.csv'), index_col = False)

## 2.2 Products

In [12]:
#Import 'products' dataset
df_products = pd.read_csv(os.path.join(path, '02 - Data', 'Original Data', 'products.csv'), index_col = False)

## 2.3 Departments

In [14]:
#Import 'departments' dataset
df_departments = pd.read_csv(os.path.join(path, '02 - Data', 'Original Data', 'departments.csv'), index_col = False)

# 3. Check imported data

## 3.1 Orders

In [17]:
# Get dimensions
df_orders.shape

(3421083, 7)

In [18]:
#View first 5 rows of the dataframe
df_orders.head()

Unnamed: 0,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2539329,1,prior,1,2,8,
1,2398795,1,prior,2,3,7,15.0
2,473747,1,prior,3,3,12,21.0
3,2254736,1,prior,4,4,7,29.0
4,431534,1,prior,5,4,15,28.0


## 3.2 Products

In [20]:
# Get dimensions
df_products.shape

(49693, 5)

In [21]:
#View first 5 rows of the dataframe
df_products.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
1,2,All-Seasons Salt,104,13,9.3
2,3,Robust Golden Unsweetened Oolong Tea,94,7,4.5
3,4,Smart Ones Classic Favorites Mini Rigatoni Wit...,38,1,10.5
4,5,Green Chile Anytime Sauce,5,13,4.3


## 3.3 Departments

In [23]:
# Get dimensions
df_departments.shape

(1, 22)

In [24]:
#View first 5 rows of the dataframe
df_departments.head()

Unnamed: 0,department_id,1,2,3,4,5,6,7,8,9,...,12,13,14,15,16,17,18,19,20,21
0,department,frozen,other,bakery,produce,alcohol,international,beverages,pets,dry goods pasta,...,meat seafood,pantry,breakfast,canned goods,dairy eggs,household,babies,snacks,deli,missing


# 4. Data wrangling

## 4.1. Orders

### 4.1.1. Drop columns

#### 4.1.1.1. Drop 'eval_set' column

In [29]:
# View column names
df_orders.columns

Index(['order_id', 'user_id', 'eval_set', 'order_number', 'order_dow',
       'order_hour_of_day', 'days_since_prior_order'],
      dtype='object')

##### 'eval_set' column contains extraneous information

In [31]:
#Remove 'eval_set' column and update the dataframe with results
df_orders = df_orders.drop(columns = ['eval_set'])

In [32]:
#View first 5 rows of updated dataframe
df_orders .head()

Unnamed: 0,order_id,user_id,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2539329,1,1,2,8,
1,2398795,1,2,3,7,15.0
2,473747,1,3,3,12,21.0
3,2254736,1,4,4,7,29.0
4,431534,1,5,4,15,28.0


#### 4.1.1.2. Frequency distribution 'days_since_prior_order' column

In [34]:
#Display the frequency of unique values in the column
df_orders['days_since_prior_order'].value_counts(dropna = False)

days_since_prior_order
30.0    369323
7.0     320608
6.0     240013
4.0     221696
3.0     217005
5.0     214503
NaN     206209
2.0     193206
8.0     181717
1.0     145247
9.0     118188
14.0    100230
10.0     95186
13.0     83214
11.0     80970
12.0     76146
0.0      67755
15.0     66579
16.0     46941
21.0     45470
17.0     39245
20.0     38527
18.0     35881
19.0     34384
22.0     32012
28.0     26777
23.0     23885
27.0     22013
24.0     20712
25.0     19234
29.0     19191
26.0     19016
Name: count, dtype: int64

##### Variable contains 206209 null values. Does not validate removal of the column. Investigate further during consistency checks

#### 4.1.1.3. Frequency distribution 'order_hour_of_day' column (Task)

In [37]:
#Display the frequency of unique values in the column
df_orders['order_hour_of_day'].value_counts(dropna = True)

order_hour_of_day
10    288418
11    284728
15    283639
14    283042
13    277999
12    272841
16    272553
9     257812
17    228795
18    182912
8     178201
19    140569
20    104292
7      91868
21     78109
22     61468
23     40043
6      30529
0      22758
1      12398
5       9569
2       7539
4       5527
3       5474
Name: count, dtype: int64

##### The busiest hour for placing orders is 10am

### 4.1.2. Rename columns

#### 4.1.2.1. Rename 'order_dow' column

##### Unclear column header: 'order_dow'

In [42]:
#Rename column to 'orders_day_of_week'
df_orders.rename(columns = {'order_dow' : 'orders_day_of_week'}, inplace = True)

In [43]:
#View first 5 rows of updated dataframe
df_orders.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
0,2539329,1,1,2,8,
1,2398795,1,2,3,7,15.0
2,473747,1,3,3,12,21.0
3,2254736,1,4,4,7,29.0
4,431534,1,5,4,15,28.0


#### 4.1.2.2 Rename 'days_since_prior_order' column (Task)

In [45]:
#Rename column to 'days_since_last_order' (without saving) and view
df_orders.rename(columns = {'days_since_prior_order' : 'days_since_last_order'}, inplace = False)

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_last_order
0,2539329,1,1,2,8,
1,2398795,1,2,3,7,15.0
2,473747,1,3,3,12,21.0
3,2254736,1,4,4,7,29.0
4,431534,1,5,4,15,28.0
...,...,...,...,...,...,...
3421078,2266710,206209,10,5,18,29.0
3421079,1854736,206209,11,4,10,30.0
3421080,626363,206209,12,1,12,18.0
3421081,2977660,206209,13,1,12,7.0


#### 4.1.2.3 Rename 'order_number' column 

In [47]:
#Rename column to 'orders_day_of_week'
df_orders.rename(columns = {'order_number' : 'user_order_number'}, inplace = True)

In [48]:
#View first 5 rows of updated dataframe
df_orders.head()

Unnamed: 0,order_id,user_id,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
0,2539329,1,1,2,8,
1,2398795,1,2,3,7,15.0
2,473747,1,3,3,12,21.0
3,2254736,1,4,4,7,29.0
4,431534,1,5,4,15,28.0


### 4.1.3. Change data types

In [50]:
#Generate descriptive statistics for numeric columns in the dataframe
df_orders.describe()

Unnamed: 0,order_id,user_id,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
count,3421083.0,3421083.0,3421083.0,3421083.0,3421083.0,3214874.0
mean,1710542.0,102978.2,17.15486,2.776219,13.45202,11.11484
std,987581.7,59533.72,17.73316,2.046829,4.226088,9.206737
min,1.0,1.0,1.0,0.0,0.0,0.0
25%,855271.5,51394.0,5.0,1.0,10.0,4.0
50%,1710542.0,102689.0,11.0,3.0,13.0,7.0
75%,2565812.0,154385.0,23.0,5.0,16.0,15.0
max,3421083.0,206209.0,100.0,6.0,23.0,30.0


In [51]:
#Return the data types of columns in the dataframe
df_orders.dtypes

order_id                    int64
user_id                     int64
user_order_number           int64
orders_day_of_week          int64
order_hour_of_day           int64
days_since_prior_order    float64
dtype: object

#### 4.1.3.1. Change 'order_id' data type 

In [53]:
#Change data type of the column to string and update the dataframe with results
df_orders['order_id'] = df_orders['order_id'].astype('str')

In [54]:
#Display the datatype of the updated column
df_orders['order_id'].dtype

dtype('O')

In [55]:
#Generate descriptive statistics for numeric columns in the updated dataframe
df_orders.describe()

Unnamed: 0,user_id,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
count,3421083.0,3421083.0,3421083.0,3421083.0,3214874.0
mean,102978.2,17.15486,2.776219,13.45202,11.11484
std,59533.72,17.73316,2.046829,4.226088,9.206737
min,1.0,1.0,0.0,0.0,0.0
25%,51394.0,5.0,1.0,10.0,4.0
50%,102689.0,11.0,3.0,13.0,7.0
75%,154385.0,23.0,5.0,16.0,15.0
max,206209.0,100.0,6.0,23.0,30.0


#### 4.1.3.2. Change 'user_id' data type (Task)

In [57]:
#Change data type of user_id column to string and update the dataframe with results
df_orders['user_id'] = df_orders['user_id'].astype('str')

In [58]:
#Display the datatype of the updated column
df_orders['user_id'].dtypes

dtype('O')

In [59]:
#Generate descriptive statistics for numeric columns in the updated dataframe
df_orders.describe()

Unnamed: 0,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
count,3421083.0,3421083.0,3421083.0,3214874.0
mean,17.15486,2.776219,13.45202,11.11484
std,17.73316,2.046829,4.226088,9.206737
min,1.0,0.0,0.0,0.0
25%,5.0,1.0,10.0,4.0
50%,11.0,3.0,13.0,7.0
75%,23.0,5.0,16.0,15.0
max,100.0,6.0,23.0,30.0


## 4.2. Departments

### 4.2.1. Transpose departments dataframe

#### 4.2.1.1. View dataframe

In [63]:
#View the first 5 rows of the dataframe
df_departments.head()

Unnamed: 0,department_id,1,2,3,4,5,6,7,8,9,...,12,13,14,15,16,17,18,19,20,21
0,department,frozen,other,bakery,produce,alcohol,international,beverages,pets,dry goods pasta,...,meat seafood,pantry,breakfast,canned goods,dairy eggs,household,babies,snacks,deli,missing


#### 4.2.1.2. Transpose dataframe

In [65]:
#Transpose the dataframe switching rows and columns and view results
df_departments.T

Unnamed: 0,0
department_id,department
1,frozen
2,other
3,bakery
4,produce
5,alcohol
6,international
7,beverages
8,pets
9,dry goods pasta


In [66]:
#Transpose the dataframe and update the dataframe with results
df_departments = df_departments.T

#### 4.2.1.3. Reset index

In [68]:
#Reset the index of the dataframe
df_departments.reset_index()

Unnamed: 0,index,0
0,department_id,department
1,1,frozen
2,2,other
3,3,bakery
4,4,produce
5,5,alcohol
6,6,international
7,7,beverages
8,8,pets
9,9,dry goods pasta


#### 4.2.1.4. Create 'new_header' variable

In [70]:
#Select row 0 of the departments dataframe that contains the headings and assign it to a variable
new_header = df_departments.iloc[0]

In [71]:
#View the created variable
new_header

0    department
Name: department_id, dtype: object

#### 4.2.1.5. Slice off old header

In [73]:
#Select all rows from the departments dataframe from row 1 onward and update the dataframe with results
df_departments = df_departments[1:]

In [74]:
#View updated dataframe
df_departments

Unnamed: 0,0
1,frozen
2,other
3,bakery
4,produce
5,alcohol
6,international
7,beverages
8,pets
9,dry goods pasta
10,bulk


#### 4.2.1.6. Add new header

In [76]:
#Assign new variable to column names
df_departments.columns = new_header

In [77]:
#View first 5 rows of the updated dataframe
df_departments.head()

department_id,department
1,frozen
2,other
3,bakery
4,produce
5,alcohol


# 5. Data dictionaries

## 5.1. Create a data dictionary 

In [80]:
#Turn dataframe into data dictionary
data_dictionary = df_departments.to_dict('index')

In [81]:
#View created data dictionary
data_dictionary

{'1': {'department': 'frozen'},
 '2': {'department': 'other'},
 '3': {'department': 'bakery'},
 '4': {'department': 'produce'},
 '5': {'department': 'alcohol'},
 '6': {'department': 'international'},
 '7': {'department': 'beverages'},
 '8': {'department': 'pets'},
 '9': {'department': 'dry goods pasta'},
 '10': {'department': 'bulk'},
 '11': {'department': 'personal care'},
 '12': {'department': 'meat seafood'},
 '13': {'department': 'pantry'},
 '14': {'department': 'breakfast'},
 '15': {'department': 'canned goods'},
 '16': {'department': 'dairy eggs'},
 '17': {'department': 'household'},
 '18': {'department': 'babies'},
 '19': {'department': 'snacks'},
 '20': {'department': 'deli'},
 '21': {'department': 'missing'}}

## 5.2. Retrieve information - 'department_id' = 19

In [83]:
#Display the first 5 rows of the products dataframe
df_products.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
1,2,All-Seasons Salt,104,13,9.3
2,3,Robust Golden Unsweetened Oolong Tea,94,7,4.5
3,4,Smart Ones Classic Favorites Mini Rigatoni Wit...,38,1,10.5
4,5,Green Chile Anytime Sauce,5,13,4.3


In [84]:
#Retrieve the value associated with department_id = 19 from the data dictionary
print(data_dictionary.get('19'))

{'department': 'snacks'}


## 5.3. Retrieve information - 'department_id' = 4 (Task)

In [86]:
#Retrieve the value associated with department_id = 4 from the data dictionary
print(data_dictionary.get('4'))

{'department': 'produce'}


##### The description for department with id = 4 is 'produce'

# 6. Create subsets using filters

## 6.1. Create 'snacks' subset (Method 1)

In [90]:
#Filter the dataframe for rows where department_id equals 19 using == operator and create subset with results
df_snacks_1 = df_products[df_products['department_id'] == 19]

In [91]:
#View first 5 rows of created subset
df_snacks_1.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
15,16,Mint Chocolate Flavored Syrup,103,19,5.2
24,25,Salted Caramel Lean Protein & Fiber Bar,3,19,1.9
31,32,Nacho Cheese White Bean Chips,107,19,4.9
40,41,Organic Sourdough Einkorn Crackers Rosemary,78,19,6.5


## 6.2. Create 'snacks' subset (Method 2)

In [93]:
#Filter the dataframe for rows where department_id equals 19 using indexing method and create subset with results
df_snacks_2 = df_products.loc[df_products['department_id'] == 19]

In [94]:
#View first 5 rows of created subset
df_snacks_2.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
15,16,Mint Chocolate Flavored Syrup,103,19,5.2
24,25,Salted Caramel Lean Protein & Fiber Bar,3,19,1.9
31,32,Nacho Cheese White Bean Chips,107,19,4.9
40,41,Organic Sourdough Einkorn Crackers Rosemary,78,19,6.5


## 6.3. Create 'miscellaneous' subset (Multiple values)

In [96]:
#Filter the dataframe for rows where department_id equals 17, 18, 19 using isin() function and create subset with results
df_miscellaneous = df_products[df_products['department_id'].isin([17,18,19])]

In [97]:
#Display the frequency of unique values in the 'department_id' column
df_miscellaneous['department_id'].value_counts(dropna = False)

department_id
19    6264
17    3085
18    1081
Name: count, dtype: int64

In [98]:
#View first 5 rows of created subset
df_miscellaneous.head()


Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
13,14,Fresh Scent Dishwasher Cleaner,74,17,6.5
14,15,Overnight Diapers Size 6,56,18,11.2
15,16,Mint Chocolate Flavored Syrup,103,19,5.2
24,25,Salted Caramel Lean Protein & Fiber Bar,3,19,1.9


## 6.4. Create 'breakfast' subset (Task)

In [100]:
#View the data dictionary
data_dictionary

{'1': {'department': 'frozen'},
 '2': {'department': 'other'},
 '3': {'department': 'bakery'},
 '4': {'department': 'produce'},
 '5': {'department': 'alcohol'},
 '6': {'department': 'international'},
 '7': {'department': 'beverages'},
 '8': {'department': 'pets'},
 '9': {'department': 'dry goods pasta'},
 '10': {'department': 'bulk'},
 '11': {'department': 'personal care'},
 '12': {'department': 'meat seafood'},
 '13': {'department': 'pantry'},
 '14': {'department': 'breakfast'},
 '15': {'department': 'canned goods'},
 '16': {'department': 'dairy eggs'},
 '17': {'department': 'household'},
 '18': {'department': 'babies'},
 '19': {'department': 'snacks'},
 '20': {'department': 'deli'},
 '21': {'department': 'missing'}}

In [101]:
#Filter the dataframe for rows where department_id equals 14 and create a subset with the results
df_breakfast = df_products[df_products['department_id'] == 14]

In [102]:
#Display the frequency of unique values in the 'department_id' column
df_breakfast['department_id'].value_counts(dropna = False)

department_id
14    1116
Name: count, dtype: int64

In [103]:
#View first 5 rows of created subset
df_breakfast.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
27,28,Wheat Chex Cereal,121,14,10.1
33,34,,121,14,12.2
67,68,"Pancake Mix, Buttermilk",130,14,13.7
89,90,Smorz Cereal,121,14,3.9
210,211,Gluten Free Organic Cereal Coconut Maple Vanilla,130,14,3.6


## 6.5. Create 'dinner' subset (Task)

In [105]:
#View the data dictionary
data_dictionary

{'1': {'department': 'frozen'},
 '2': {'department': 'other'},
 '3': {'department': 'bakery'},
 '4': {'department': 'produce'},
 '5': {'department': 'alcohol'},
 '6': {'department': 'international'},
 '7': {'department': 'beverages'},
 '8': {'department': 'pets'},
 '9': {'department': 'dry goods pasta'},
 '10': {'department': 'bulk'},
 '11': {'department': 'personal care'},
 '12': {'department': 'meat seafood'},
 '13': {'department': 'pantry'},
 '14': {'department': 'breakfast'},
 '15': {'department': 'canned goods'},
 '16': {'department': 'dairy eggs'},
 '17': {'department': 'household'},
 '18': {'department': 'babies'},
 '19': {'department': 'snacks'},
 '20': {'department': 'deli'},
 '21': {'department': 'missing'}}

In [106]:
#Retrieve the value associated with department_id 5, 7, 12, & 20 from the data dictionary and create a subset with the results
df_dinner = df_products[df_products['department_id'].isin([5,7,12,20])]

In [107]:
#View results
df_dinner['department_id'].value_counts(dropna = False)

department_id
7     4365
20    1322
5     1056
12     907
Name: count, dtype: int64

### 6.5.1. View 'dinner' dimensions (Task)

In [109]:
#Get the dimensions of the created subset
df_dinner.shape

(7650, 5)

##### The dataframe has 7650 rows

## 6.6. Create 'user_id' = 1 subset (Task)

In [112]:
#Extract filtered records for user_id 1 and create a subset with the results
df_orders_user_1 = df_orders[df_orders['user_id'] == '1']

In [113]:
#View dataframe
df_orders_user_1

Unnamed: 0,order_id,user_id,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
0,2539329,1,1,2,8,
1,2398795,1,2,3,7,15.0
2,473747,1,3,3,12,21.0
3,2254736,1,4,4,7,29.0
4,431534,1,5,4,15,28.0
5,3367565,1,6,2,7,19.0
6,550135,1,7,1,9,20.0
7,3108588,1,8,1,14,14.0
8,2295261,1,9,1,16,0.0
9,2550362,1,10,4,8,30.0


### 6.6.1. View descriptive statistics - 'user_id' = 1 (Task)

In [115]:
#Generate descriptive statistics for user_01
df_orders_user_1.describe()

Unnamed: 0,user_order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order
count,11.0,11.0,11.0,10.0
mean,6.0,2.636364,10.090909,19.0
std,3.316625,1.286291,3.477198,9.030811
min,1.0,1.0,7.0,0.0
25%,3.5,1.5,7.5,14.25
50%,6.0,3.0,8.0,19.5
75%,8.5,4.0,13.0,26.25
max,11.0,4.0,16.0,30.0


##### The client placed a total of 11 orders.  All orders were placed between Sunday and Wednesday, with most on Tuesdays.  All orders were placed between 7am and 4pm, with the most being placed at 8am.  Orders were placed on consecutive days up to a month between orders, but average time is 19 days between orders.

# 7. Export dataframes

### 7.1. Orders

In [119]:
#Get the dimensions of the dataframe
df_orders.shape

(3421083, 6)

In [120]:
#Export dataset
df_orders.to_csv(os.path.join(path, '02 - Data', 'Prepared Data', '04_orders_wrangled.csv'))

### 7.2. Departments

In [122]:
#Get the dimensions of the dataframe
df_departments.shape

(21, 1)

In [123]:
#Export dataset
df_departments.to_csv(os.path.join(path, '02 - Data', 'Prepared Data', '04_departments_wrangled.csv'))

# Task

## 1. Data wrangling
If you haven’t done so already, perform the wrangling procedures you walked through in this Exercise on your project data in a new notebook for this Exercise. Then, add a new section heading to separate your wrangling procedures from the procedures you’ll be conducting in the steps below.

##### Refer: 4. Data wrangling

## 2. Change data type for another variable
Find another identifier variable in the df_ords dataframe that doesn’t need to be included in your analysis as a numeric variable and change it to a suitable format

##### Refer: 4.1.3.2. Change 'user_id' data type (Task)

## 3. Rename 'days_since_prior_order' column
Look for a variable in your df_ords dataframe with an unintuitive name and change its name without overwriting the dataframe.

##### Refer: 4.1.2.2. Rename 'days_since-prior_order' column (Task)

## 4. View frequency distribution of 'order_hour_of_day'
Your client wants to know what the busiest hour is for placing orders. Find the frequency of the corresponding variable and share your findings

##### Refer: 4.1.1.3. Frequency distribution 'order_hour_of_day' (Task)

## 5. Retrieve information for 'department_id' = 4
Determine the meaning behind a value of 4 in the "department_id" column within the df_prods dataframe using a data dictionary

##### Refer: 5.3. Retrieve information - 'department_id' = 4 (Task)

## 6. Create 'breakfast' subset
The sales team in your client’s organization wants to know more about breakfast item sales. Create a subset containing only the required information.

##### Refer: 6.4. Create 'breakfast' subset (Task)

## 7. Create 'dinner' subset
They’d also like to see details about products that customers might use to throw dinner parties. Your task is to find all observations from the entire dataframe that include items from the following departments: alcohol, deli, beverages, and meat/seafood. You’ll need to present this subset to your client.

##### Refer: 6.5. Create 'dinner' subset (Task)

## 8. View dimensions for 'dinner' subset
It’s important that you keep track of total counts in your dataframes. How many rows does the last dataframe you created have?

##### Refer: 6.5.1 View 'dinner' dimensions (Task)

## 9. Create 'user_id' = 1 subset 
Someone from the data engineers team in Instacart thinks they’ve spotted something strange about the customer with a "user_id" of “1.” Extract all the information you can about this user.

##### Refer: 6.6. Create 'user_id' = 1 subset (Task)

## 10. View descriptive statistics for 'user_id' = 1
You also need to provide some details about this user’s behavior. What basic stats can you provide based on the information you have?

##### Refer: 6.6.1. View descriptive statistics - 'user_id' = 1 (Task)

## 11. Check notebook
Check the organization and structure of your notebook. Be sure to include section headings and code comments.

##### Checked notebook and included section headings and code comments

## 12. Export 'orders' dataframe
Export your df_ords dataframe as “orders_wrangled.csv” in your “Prepared Data” folder.

##### Refer: 7.1. Orders

## 13. Export 'departments' dataframe
Export the df_dep_t_new dataframe as “departments_wrangled.csv” in your “Prepared Data” folder so that you have a “.csv” file of your departments data in the correct format.

##### Refer: 7.2. Departments

## 14. Save and submit notebook
Save your Jupyter notebook and submit it here for your tutor to review.

##### Notebook saved and submitted