# The Notebook

Since graduating from a health related course, I began thinking what I want to do for my career. Early on I realized that I no longer see myself in the expected path of becoming a clinician. What followed were several years of searching for my passion. Finally two years ago, I was involved in research and here I was introduced to data. I was thrilled when I got a chance to analyze the findings of the research. Due to this, I was driven to learn more about analyzing data and this led me to data science. Unfortunately, I do not know how to start learning data science. I feel that I am out of my depth since data science was far removed from my finished course. I tried applying to several data science program but I was lucky enough to be accepted. Now I decided to start the journey by myself; getting information from different free courses, blogs or free books. Now this is my notebook.

## Contents
- [Libraries](#Libraries)
- [Coding](#Coding)
- [Data Analytics Framework](#DataAnalyticsFramework)
- [Python Basics](#PythonBasics)

## Libraries
[back](#Contents)
<br> The following are the most common libraries that are being used in data science

### [NumPy](https://numpy.org/devdocs/user/index.html)

Universal standard for working with numerical data in python. It is used extensively in Pandas, SciPy, Matplotlib, scikit-learn. The library contains multidimensional array and matrix data structures

In [None]:
import numpy as np

### [Scipy](https://docs.scipy.org/doc/scipy/reference/)

General scientific libraries with advanced solver

In [1]:
import scipy

### [Pandas](https://pandas.pydata.org/docs/getting_started/tutorials.html)

a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It has two primary data structures, Series and Dataframe

In [None]:
import pandas as pd

### [Matplotlib](https://matplotlib.org/3.1.1/contents.html)
#### Pyplot

matplotlib.pyplot these functions that make Matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.

In [None]:
import matplotlib.pyplot as plt

### [Seaborn](https://seaborn.pydata.org/)

aims to make visualization a central part of exploring and 
understanding data. Its dataset-oriented plotting functions operate on 
dataframes and arrays containing whole datasets and internally perform the 
necessary semantic mapping and statistical aggregation to produce informative 
plots.

In [None]:
import seaborn as sns
sns.set()

### [Bokeh](https://docs.bokeh.org/en/latest/docs/user_guide/quickstart.html)

a library for interactive data visualization. It renders its graphics using HTML and JavaScript which makes it a candidate for building web-based dashboards and applications.

In [None]:
from bokeh.plotting import figure, output_file, show
#this is just an example

### [Scikit-learn](https://scikit-learn.org/stable/)

is a library in Python that provides many unsurpervised and supervised learning algorithms.

In [None]:
from sklearn.ensemble import RandomForestClassifier
#this is just an example

### [Natural Language Toolkit](https://www.guru99.com/nltk-tutorial.html)

is one of the most powerful NLP libraries which contains packages to make machines understand human language and reply to it with an appropriate response.

In [None]:
from nlk.tokenize import RegexpTokenizer

### [GeoPandas](https://geopandas.org/)

is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.
dependicies:
numpy
pandas
shapely
fiona
pyproj
six

In [None]:
import geopandas

## Coding
[back](#Contents)

This section will provide links to guides and standards on how to create a proper and readable code that is well documented.

### [Shortcut](https://yoursdata.net/jupyter-lab-shortcut-and-magic-functions-tips/)

There are several keyboard shortcuts and majic functions that can be use to make life easier.

### [Markdown](https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed)

In jupyter notebooks, markdown is a very useful way to describe your code. Through this one can add texts, images, tables, videos and create hyperlinks to within as well to outside of the document.

### [Style Guide](https://www.python.org/dev/peps/pep-0008/)

PEP 8 is a document that gives coding conventions for the Python code. This gives a guide for programmers on how they format their code but they also indicate that project style guidelines that differs from PEP takes precedent in that particular project. 

## DataAnalyticsFramework
[back](#Contents)

The [Analytics Association of the Philippines](https://aap.ph/) have created a framework to guide the conversation about data analytics by defining key terminologies.

#### Data Analytics Skill

|Skill|Definition|Job|Level 0|Level 1|Level 2|Level 3|
|-----|----------|---|-------|-------|-------|-------|
|**Business and Organizational Skills**|||||||
|Domain Knowledge and application|apply domain related knowledge and insights to contextualize data|Functional Analysts|No skill|Understand collected data, and how they are handled and applied in the specific industry domain.|Develop content strategy and information architecture to support a given industry domain|Make business cases to improve domain-related procedures through data-driven decision-making|
|Data Management and governance|develop and implement data management strategies, incorporating privacy and data security, policies and regulations, and ethical onsiderations.|Data Stewards|No skill|aware and always apply policies and measures to ensure data security, privacy, intellectual property, and ethics.|enforce policies and procedures for data security, privacy, intellectual property, and ethics.|develop policies on data security, privacy, intellectual property, and ethics.|
|Operational Analytics|use general and specialized analytics techniques for the investigation of all relevant data to derive insight for decision-making.|Analytics Managers|No skill|perform business analysis for specified tasks and data sets.|You identify business impact from trends and patterns.|You identify new opportunities to use historical data for organizational processes optimization.|
|Data Visualization and presentation|create and communicate compelling and actionable insights from data using visualization and presentation tools and technologies.||No skill|You prepare data visualization reports or narratives based on provided specifications|You create infographics for effective presentation and communication of actionable outcomes.|You select appropriate and develop new visualization methods used in a specific industry.|
|**Technical skills**|||||||
|Research methods|strategies, processes or techniques utilized in the collection of data or evidence for analysis in order to uncover new information or create better understanding of a topic.|Data Scientists|No skill|You understand and use the 4-step research model: hypothesis, research methods, artifact, evaluation.|You develop research questions around identified issues within existing research or business process models.|You Design experiments which include data collection (passive and active) for hypothesis testing and problem solving.|
|Data engineering principles|They are the ones who will bring all the needed data from the various sources, extract, clean, aggregate, transform, and finally load them to the identified data repositories|Data Engineers|No skill|You have knowledge and ability to program selected SQL and NoSQL platform for data storage and access, in particular write ETL scripts.|You design and build relational and non-relational databases, ensure effective ETL processes for large datasets.|You have advanced knowledge and experience of using modern Big Data technologies to process different data types from multiple sources.|
|Statistical Techniques|Here, mathematical formulas are used in the analysis of raw research data. The application of these techniques extracts information from research data and provides different ways to assess the robustness of research outputs.|Data Scientists|No skill|You know and use statistical methods such as sampling, ANOVA, hypothesis testing, descriptive statistics, regression analysis, and others.|You select and recommend appropriate statistical methods and tools for specific tasks and data.|You identify problems with collected data and suggest corrective measures, including additional data collection, inspection, and pre-processing.|
|Data Analystics Methods and Algorithms|implement and evaluate machine learning methods and algorithms on the data to derive insights for decision- making.|Data Scientists|No skill|You demonstrate understanding and perform statistical hypothesis testing; you can explain statistical significance of collected data.|You apply quantitative techniques (e.g., time series analysis, optimization, simulation) to deploy appropriate models for analysis and prediction.|You assess data on reliability and appropriateness; you select appropriate approaches and their impact on analysis and the quality of the results.|
|Computing|apply information technology, computational thinking, and utilize programming languages and software and hardware solutions for data analysis.|Data Engineers, Data Scientists|No skill|You perform basic data manipulation, analysis, and visualization.|You apply computational thinking to transform formal data models and process algorithms into program code.|You select appropriate application and statistical programming languages, and development platforms for specific processes and data sets.|
|**21st Century Skills**|
|Critical Thinking|Demonstrating the ability to apply critical thinking skills to solve problems and make effective decisions|
|Communication|Understanding and communicating ideas|
|Collaboration|Working with others|
|Creativity and Attitude|Deliver high quality work and focus on final result, initiative, intellectual risk|
|Plannining and Organizing|Planning and prioritizing work to manage time effectively and accomplish tasks|
|Business fundamentals|Having fundamental knowledge of the organization and the industry|
|Customer Focus|Actively look for ways to identify market demands and meet client needs|
|Working with Tools and Technology|Selecting, using and maintaining tools and technology to facilitate work activity|
|Dynamic (self-) re-skilling|ability to adopt to change|
|Professional network|involvement to professional network activities|
|Ethics|ethics in the use of technology, biased data collection and presentation|

## PythonBasics
[back](#Contents)

### Basic operations

In [4]:
# Numbers
10 + 4       # add
10 - 4       # subtract
10 * 4       # multiply
10 ** 4      # exponent
10 / 4       # divide
5 % 4        # modulo
10 // 4      # floor division
# Boolean operations
5>4
5>=3
5!=3
5 == 5
5 > 3 and 6 > 3
5 > 3 or 5 <3
not False
False or not False and True # evaluation order: not. and, or

True

### Functions

Functions are set of instructions launched when called upon, they can have multiple input values and a return value.

In [None]:
def double(x):
    '''This is an  example how a function would look like.
    The function starts with def then the name of the function.
    Then enclosed in the parenthesis is the input.
    There could be more than one input. Afterwhich, enclosed in 3 quotation
    marks '''