In [1]:
# INTRODUCTION TO FINANCIAL ANALYSIS
# Fintech is often data driven. 
# Traders review market patterns across the globe.
# Real estate analysts scour sales data along with socioeconomic dat to decide which neighborhoods to invest in.
# Financial Analysis is a core part of what we do as FinTech professionals.
# In this module, you'll apply a three-phase process for financial data analysis:
    # 1. Collect the data.
    # 2. Prepare the data.
    # 3. Analyze the data.

In [3]:
# USING A FRAMEWORK FOR FINANCIAL ANALYSIS
# Although no single best way exists for analyzing financial data, it typically follows the pattern above: Collect, prepare, analyze.
# In this lesson, we'll look at the specifics of data collection, which makes up the first phase of the process.
# But firest, here's a brief explanation of each phase:
    
    # 1. Collect the data: 
    # In this phase, you collect the data for analysis by importing it into a container that Python and Pandas can work on.
    
    # 2. Prepare the data: 
    # In this phase, you clean the data to account for missing, incomplete, erroneous, and duplicated data.
    # This phase also includes manipulating segments of the dataset to highlight relationships between variables.
    
    # 3. Analyze the data:
    # This phase consists of an iterative process and varies based on the goal of your analysis research.
    # Regardless of the specifics, the analysis involves testing your thesis, reviewing the results, and refining both until you reach a conclusion.  

In [4]:
# REVIEW YOUR FINANCIAL ANALYSIS TOOLS
# In this module, you'll use two manin tools for your financial analysis: 
    # Pandas 
    # JupyterLab IDE.
    
# PANDAS:
    # Pandas is a Python library designed specifically for data analysis.
    # Pandas offers a streamlined way of reviewing datasets and includes various functions that simplify importing, updating, and analyzing data.
    # Pandas is designed to work with multiple file formats, such as CSV, which you can import and read with a few lines of code.
    # Additionally, Pandas has various functions for reviewing and preparing data for analysis.
    # Once the analysis begins, Pandas is optimized to perform quickly and efficiently.
    # Pandas has a human-friendly coding syntax that's straightforward for beginners to understand and use.
    # Being one of Python's most powerful tools, mastery of it will take your FinTech skills to the next level.

# JUPYTERLAB:
    # JupyterLab is a web-based user interface that you use to run and review Python-based programs.
    # JupyterLab easily integrates with the Anaconda software package and your Conda dev environment.
    # JupyterLab is one of the main IDEs used for Python application development across FinTech and other data-driven industries.

In [5]:
# DATA CONTAINERS
# DATA COLLECTION is the proccess of importing data into a CONTAINER, or structure, and then organizing the data so that we can work with it.

# SERIES AND DATAFRAMES IN PANDAS
# SERIES: is a container for a sequence of data.
# It functions like a column in a spreadsheet and can hold any type of data.
# DATAFRAME: the equivalent of two or more Series.
# With DataFrames, we can work with rows and columns (that is, tabular data).
# While Pandas DataFrames are structured like spreadsheets, they offer more powerful ways for us to express and manipulate data than any available spreadsheet software.

In [8]:
# IMPORTING DATA
# As previously discussed, to import a CSV file into Pandas we need a function called read_csv.
# This function accepts a path to the location of the CSV file and automatically reads it.
# It then imports all the data from the CSV into a Pandas DataFrame.

import pandas as pd
from pathlib import Path

# Create a CSV Path variable called csvpath
csvpath = Path("sales.csv")
# Create a variable called sales_df to read and store the sales DataFrame
sales_df = pd.read_csv(csvpath)
# Read the first 5 rows of the DataFrame
sales_df.head()

# CHECK THE DATA
# As demonstrated above, a vital part of this process is making sure the data was correctly imported.
# To do this, we use the `.head()` function, which shows the first five lines of data.
# This gives us a quick view of the data to check whether our data appears as expected.
# The output of `sales_df` displays the CSV file header columns: FullName, Email, Zip, & SalePrice.
# Each of the following rows correspond with the column they are in.
# A column of numbers, starting with 0, exists to the left of the output. 
# This is the index column.
# Unless the code indicates which column form the CSV should be set as the index, Pandas does so automatically.
# The index column doesn't have a header.

Unnamed: 0,FullName,Email,Zip,SalePrice
0,Elwanda White,alyre2036@live.com,9236,84.33
1,Lyndon Elliott,arrowy1873@outlook.com,1330,879.95
2,Daisey Sellers,toucan2024@outlook.com,7631,907.58
3,Issac Reeves,asarin1958@gmail.com,81168,545.88
4,Bradford Kinney,mibound1801@yandex.com,41721,517.49
