# Introduction to Data Analytics

## What is Data Analytics?

> Data analytics is the process of analysing large amounts of raw data (real-time, historical, structured and unstructured) in order to derive meaningful insights about patterns, trends and relationships with the aim of enhancing business decision-making and optimising business efficiency

The main idea of data analytics is to leverage available isolated data sources stored in an organisation, work on cleaning and integrating that data together in order create meaningful information that could be analysed using a variety of tools and techniques. 

Data analytics, as a practice, has been around for some time. Experts working in this field were traditionally called Business Analysts (BAs) or Business Intelligence (BI) consultants. However, the role has evolved over time into a more complex discipline, and it's more common to refer to these experts nowadays as Data Analysts. 

Sometimes we'll see job postings containing Data Analyst as a title, yet the postings contains skills expected from data engineers and data scientists. This is because these 3 roles have some overlapping aspects. Data analysts are technical experts with intersecting IT (software development and data engineering), business knowledge (data visualisation and presentation) and statistics (complex data calculations) skills:

<p align="center">
<img src= "images/data-analyst.jpg" width=400>
<figcaption align="center"><cite>Data Analyst vs Data Engineer vs Data Scientist</cite></figcaption font-style="Italic">
</p>


## Types of Data Analytics

> There are 4 main types of data analytics: descriptive, diagnostic, predictive and prescriptive

<p align="center">
<img src= "images/4-types-of-data-analytics.png" width=600>
<figcaption align="center"><cite>Types of Data Analytics</cite></figcaption font-style="Italic">
</p>

### 1. Descriptive Analytics

- This type of analytics is focused on analysing historical data to better understand _what_ has happened during a specific time period
- It's a technique commonly leveraged to summarise the findings from large datasets into compact outcomes mainly for sharing with stakeholders and business executives
- For example, creating Key Performance Indicators (KPIs) such as the average sales revenue per customer per store, are created from analysing a much larger sales, product and customer datasets


### 2. Diagnostic Analytics

- This type of analytics is focused on analysing data to understand _why_ something has happened
- It's a type of Root Cause Analysis, which aims to dig deeper into insights derived from the simpler Descriptive analytics technique to find the main cause behind the problem or finding
- For example, if the sales for a particular month declined significantly, then a Diagnostic analysis will dig deeper into the details with the goal of finding out why sales dropped
    - Was it for a certain product only?
    - Did sales drop for a specific region only but increased in others?
    - Was it due to severe weather?

### 3. Predictive Analytics

- This type of analytics builds on the patterns, trends and co-relationships derived from analysing historical data in order to attempt to predict the future
- Predictive analytics leverages advanced data science and machine learning techniques to build models that can be used to tell _what is likely to happen_ based on analysing massive amounts of historical data. 
- For example, some models help predict:
    - What will be the sales revenue next month for a specific product?
    - How many passengers do we expect during the first week of June for a flight from London, UK to Toronto, Canada?
    - How will sales be impacted in the central London store in case of a major snow storm?
- By anticipating future behavior and data, organisations can better prepare for situations like:
    - Handling increased demand for certain products during peak periods (such as a holiday season)
    - Managing inventory of products during times of global crises (such as COVID)
    - Forecasting revenue and profit

### 4. Prescriptive Analytics

- Finally, Prescriptive Analytics helps to suggest _what course of action to pursue_ to deal with an anticipated situation or problem
- It uses the insights derived from Predictive analytics to help make data-driven decisions
- For example, if a retailer has predicted that sales will drop by 70% in the central London store due to a major snow storm next week, this information can help to make contingency plans such as: 
    - Focusing on online sales during that period by providing a discount of 20% on products ordered online
    - Reducing staffing for that store throughout the storm


## Process for Data Analytics

> Although there is no universal agreed upon methodology, the process for implementing data analytics in global companies can be viewed broadly as consisting of the below 7 steps

<p align="center">
<img src= "images/data-analytics-process2.jpg" width=600>
<figcaption align="center"><cite>Data Analytics Process</cite></figcaption font-style="Italic">
</p>

### 1. Determine Business Problem

- In this step, the detailed business problem we want to analyse is defined
- For instance, the problem could be: 
    - Why did our sales decline by 50% last month for product X?

### 2. Determine Data Sources

- Once the business problem has been defined, the next step is to identify which data sources are required to provide the information needed to answer that problem
- In this step, the main target is to create a list of all required databases, files, NoSQL data stores etc.
- Continuing our previous example, we may need data from the Sales database, Product database, and other information such as weather

### 3. Identifying Required Data

- Once we have identified the data sources required, the next objective is to take a deeper dive into each data source to determine which exact data is needed
- For instance, within the Sales database, which exact tables do we need? Do we need the entire historical dataset or will the previous year's data suffice? How many fields do we need from each table?

### 4. Data Cleansing and Pre-processing

- Once we've determined the exact data needs, the next step is data pre-processing and cleansing
- In this step, all invalid or empty data is removed, unnecessary columns are taken out, data that is out-of-range is excluded etc.
- For example, we may exclude Sales data that is older than 1 year and any data that is not related to product X

### 5. Transform the Data
- Once the data has been cleaned and better prepared, the next goal is to perform ETL/ELT operations on the data
- This might include changing the format of the data from one type to another (such as from CSV to JSON), and/or integrating multiple files or SQL tables together (using JOINs)
- For instance, we'll probably need to `JOIN` the Sales and Product tables and to filter the tables to include only the sales for product X during the past year

### 6. Analyse the Data

- Now that the data is in good shape, we can actually start the data analysis task
- In this step, data analysts review the data in detail for clues to help explain _why_ the problem has occurred 
- Continuing with our example, the outcome of this step is to determine, using data-driven approaches, the main reason sales have declined by 50%
- Data visualisation is an important part of this step


### 7. Data Interpretation

- Finally, this step involves documenting and interpreting the findings from the data analysis step above
- All relevant findings will be summarised, interpreted and presented to business executives to help understand the root cause of the problem to help address it and to plan to avoid it in the future

## Skills of a Data Analyst

> Data analytics is a complex and maturing discipline, and the skills required tend to increase in complexity over time. The main required skills include: SQL, BI Tools, Python, Excel, ETL, statistics, and communication and data presentation

Based on a 2022 research study of [200 data Analyst job postings in the US](https://www.beamjobs.com/resume-help/data-analyst-skills), the following skills were identified as the top ones in demand:

<p align="center">
<img src= "images/data-analyst-skills.png" width=600>
<figcaption align="center"><cite>Top skills required for Data Analysts</cite></figcaption font-style="Italic">
</p>

### 1. SQL

- This is the most in-demand skill for Data Analysts required by 90% of the jobs 
- SQL skills include having a general understanding of:
    - Various database tools and technologies (such as MySQL, PostgreSQL)
    - Data modeling techniques
    - Data warehousing principles
    - SQL scripting

### 2. Business Intelligence (BI) Tools
- This is the second most in-demand skill 
- BI tools are ones which help to visualise data and creates real-time charts, graphs and dashboards
- These tools help to create powerful, interactive reports that non-technical users can leverage to live data
- The most popular BI tools include:
    - Tableau (requested by 46% of the jobs)
    - PowerBI
    - Qlikview

### 3. Programming Languages
- More recently, coding skills have become more required than in the past for Data Analyst roles
- The top most in-demand languages are:
    - Python (mentioned in 40% of the jobs)
    - R (mentioned in 30% of the jobs)

### 4. Excel

- Although it's been around for decades, Microsoft Excel still remains an important skill to have
- Advanced Excel skills are mentioned in 42% of the jobs, and they are crucial to being able to analyse data
- Advanced Excel skills are required to analyse small datasets that are less than 1 million rows in size

### 5. ETL  (Extract, Transform and Load)

- Although ETL falls in the domain of a data engineer, more and more data analyst roles are starting to require this skill
- In the above-mentioned study, 20% of jobs requested ETL skills and 60% of the jobs mentioned it was a "nice to have" skill

### 6. Statistics and Machine Learning

- Knowledge of statistical techniques was requested in 12% of the jobs
- Having this background helps to better analyse and interpret data
- Some suggested statistical concepts to learn are:
    - A/B testing
    - Multivariate testing
    - Significance testing
    - Supervised and Unsupervised learning models 

### 7. Communication and Data Presentation

- One of the core non-technical skills required by data analysts is the ability to present and communicate findings to non-technical audiences
- Being able to speak both technical and non-technical languages is a critical skill
- Data presentation techniques using PowerPoint are also a must-have

## Key Takeaways

- Data analytics is the process of analysing large amounts of raw data to derive meaningful information and insights about existing patterns, trends and relationships
- The main goals of data analytics are to: enhance business decision-making and to increase business efficiency
- Data Analytics is a more modern term used to referred to the work previously done by Business Analysts (BAs) and Business Intelligence (BI) consultants
- There are 4 main types of data analytics: Descriptive, Diagnostic, Predictive and Prescriptive
- The process of implementing data analytics in global companies consists of 7 steps: Determine the business problem, determine data sources, identify required data, data cleansing and pre-processing, transforming the data, analysing the data and data interpretation
- Data Analysts have a wide variety of skill requirements which include: SQL, BI tools, Excel, programming (Python), ETL, an understanding of statistics and machine learning in addition to solid communication and presentation skills