# Exercise 4.10: Coding Etiquette & Excel Reporting
## Table of Contents:
1. #### Using Excel with Python
2. #### Data Security
3. #### Instacart Analysis Final Report
4. #### Non-Technical Skills


### Exercise 01. Using Excel with Python

In [3]:
# Import libraries 
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import scipy

In [4]:
# Set path
path = r'C:\Users\lance\Documents\Achievement 4 Project'

In [6]:
# Import Instacart project data
instacart = pd.read_pickle(os.path.join(path, '02 Data', 'Prepared Data', 'orders_products_customers_merged.pkl'))

#### Crosstabs in Python

In [7]:
# Crosstabs are a common tool for conducting data checks in Python
# They're comparable to pivot tables in Excel

In [8]:
# Create a crosstab between the 'days_since_prior_order' column and the 'order_number' column
crosstab = pd.crosstab(instacart['days_since_prior_order'], instacart['order_number'], dropna = False)

In [9]:
# Copy the table straight to your clipboard and paste it into Excel
crosstab.to_clipboard()

The output pasted into Excel shows that where the 'days_since_prior_order' column crosses with order numbers of 1, there's a column of 0s. This supports the initial assumption about missing values and means we can safely disregard them.

#### Confiming with Stakeholders

It's important to be pedantic in the initial exploration of your data. When running missing value and duplicate checks, transfer the data in question onto an Excel and send it to your client. Assumptions can be confirmed via this method.

#### Python Text Editor

Sometimes you can use Python as a text editor to prep your data for Python. Imagine the following scenario: you've aggregated 120 columns with averages that contain '_mean_' within the name, and now you want to conduct an analysis only on these columns by creating a subset. Here's one way to save time with Excel:
1. Copy all the columns from Python using the df.columns function
2. Paste the columns into Excel in a transposed format
3. Filter out all the cells that contain “_mean_” within their names
4. Add the quotation marks and commas by using the & operator
5. Copy the output into Python and create your subset

###  Exercise 02. Data Security 

Data security is a topic that will come up again and again, especially once you land a job in the analysis field. It covers many domains, from storing data securely, to sending sensitive information, to communicating the results of an analysis that contains sensitive data. All of these are things you need to consider as a data analyst no matter what project you’re working on or which client you’re working for. You must be aware of personally identifiable information (PII). If even a single column within the data set can be traced back to a particular person, it's PII. 

###  Exercise 03. Instacart Analysis Final Report

The end deliverable of the Instacart Analysis project will be in the format of an Excel report. The report will include tabs referring to the different steps you conducted throughout this Achievement, for instancem your data consistency checks, data wrangling, column derivations, and visualizations.

#### Population Flows

A population flow is a flowchart that describes any change that took place in your data set throughout the prep stage of your analysis. While there are other days to report these metrics, this approach is good at illustrating how the numbers flowed throughout the checks.

#### Data Citation 

Always cite the source of your data, in any final output or deliverable that will be made available publicly. For example, instacart has a template for citing theirs. You can add the citations to the first tab of the report named the 'Data Citation.'

### Exercise 04. Non-Technical Skills

#### Skepticism and Critical Thinking

Have a vision for what you expect to see for each operation you perform. Question each output that’s returned. There will almost never be a time you receive data that’s free from incorrect or corrupted information, which is why extensive data checks are so vital to your work. Always be skeptical of your data. Only then can you discover and report on any issues it might contain.

#### Precision

Conducting analysis requires considerable attention to detail. You should have a clear overview of everything that’s happening to your project data at any given time—the total number of observations in the data set you received, how the numbers have changed after deduping and removing missing entries, and what columns are available to you.

#### Communication

Communication is perhaps one of the most important skills of all. As an analyst, you need to be capable of communicating with your team and stakeholders, keeping them up to speed on what analyses you’re proceeding with, what analyses you’ve conducted, and what you’ve achieved thus far. If you have a query about the data, or something is taking longer than expected, this should be communicated to all relevant stakeholders in a timely manner

#### Domain Expertise

After being hired as an analyst, be proactive. Reach out to more-senior colleagues, ask questions about what certain elements of your data mean, and seek to understand the domain in which you’re working. This won’t only aid you in your development as an analyst, it will also boost your performance as a newbie in a new firm. You should always try to gain as much expertise as you can in the field in which you’re analyzing. The insights this will bring to your analysis simply can’t be overstated.