# Getting started with Data Science
## Explained in 3 parts - as a superhero trilogy movie series

##### Facebook Developer Circle:Bengaluru
##### Abdul Wasay
##### June 7, 2020




# Part 1: Origins
## Discovering your powers and learning to use them

![Mark-1](./images/ironMan-Mark1.jpg)

#### Welcome to the world of Data Science
<ul>
    <li>Computers started becoming more intertwined with our daily lives</li>
    <li>Cheaper hardware and rapid advances in technology facilitated this almost exponential growth</li>
    <li>Computers needed to store data for offering better functionalities thus files came into existence (1960-1970)</li>
    <li>Files were later replaced by better relational database systems that offerrred efficient storage, management and retrieval of data</li>
    <li>Databases grew even more powerful and advanced and with the entry of the internet, things got even better,</li>
    </ul>
    

<ul>
<li>With such humongous amounts of data there came a situation best described as "data rich but information poor</li>
    <li>Ths emerged the concept of Knowledge discovery in databases (KDD) a.k.a Knowledge mining a.k.a Data Mining</li>
</ul>

#### Data Science
<p> Data Science is a multidisciplinary field. It encompasses techniques from various domains.</p>
<ul>
    <li> Database technology</li>
    <li> Machine learning</li>
    <li> Statistics and pattern recognition </li>
    <li> Information retrieval </li>
    <li> Neural networks </li>
    <li> Knowledge-based systems </li>
    <li> Artificial intelligence </li>
    <li> High performance computing </li>
    <li> Data visualization </li>



#### So its just running my code on datasets then?
##### nah, not really.

Knowledge discovery from data takes the following steps:

- **Data cleaning**: Data will not be readily available in your desried format
- **Data integration**: You can't really look for data in just one place
- **Data selection**: You need to find needles in the haystack
- **Data transformation**: Your code has limitations on the types of data it can run analysis on
- **Pattern evaluation**: As the name suggests, look for patterns in the data
- **Knowledge presentation**: Sometimes pictures speak louder than words


What languages can you use to work on datasets and run algorithms:
- **Python**
    - Support for object oriented, structural and functional styles of programming
    - Vast libraries available for data science applications
    - Not the fastest or the best in comparison to other procedural or object oriented languages but it gets the job done and is easy to get acquainted with 
- **R** 
    - Language for statistical computing and graphics
    - Pretty simple "language" to learn
    - Its developers though prefer to call it as an environment where the language is but one part in the larger scheme of things
    

## Phew, now that's an entirely new paradigm


### Part 2: Mastering your powers
#### Every superhero needs training, learn to use your knowledge and the subsequent abilities they grant you towards solving problems
![Mark-2](./images/ironMan-Mark2.jpg)

One of the very first things you should do when getting started with anything new is to learn and master the fundamentals. In data science, the fundamental concepts form a long long list but you can get started with a few basic algorithms. 

While learning these algorithms, try to implement them in code too.

A look at one such algorithm that allows one to generate insights from data:

- Association rule mining 


## Association rule mining

Association Rule Mining is used to identify relations between items based on historical data.

1. How probable is a customer to buy Item X if they bought Item Y
2. How probable is a customer to watch movie X if they watched movie Y

1. Support: Refers to popularity of the item.

    Support = Transactions containing item / Total transactions

2. Confidence: The likelihood that an item B is also bought if item A is bought.

    Confidence (A -> B) = (Transactions containing both (A and B)) / (Transactions containing A)

3. Lift: Lift refers to the increase in the ratio of the sale of B when A is sold. 

    Lift(A –> B) can be calculated by dividing Confidence(A -> B) divided by Support(B). 

Demo on Apriori Algorithm

Step 1: Get the libraries

In [1]:
import numpy as np
import pandas as pd
from apyori import apriori

Step 2: Read the dataset

In [2]:
dataset_movies = pd.read_csv("./datasets/movies/movie_dataset.csv")
print("Number of records found: ", len(dataset_movies))

Number of records found:  7500


Step 3: Convert data frame into list of lists

In [3]:
dataset_as_list = []
for each_row in range(0, 200):
    row_as_list_of_strings = []
    for each_column in range(0, 20):
        single_entry_as_string = str(dataset_movies.values[each_row, each_column])
        row_as_list_of_strings.append(single_entry_as_string)
    dataset_as_list.append(row_as_list_of_strings)
print("First element: ", dataset_as_list[0])

First element:  ['Beirut', 'Martian', 'Get Out', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan', 'nan']


Step 4: Generate association rules

In [4]:
association_rules = apriori(dataset_as_list, min_support = 0.0053, min_confidence = 0.20, min_lift = 3, min_length = 2)
association_results = list(association_rules)

In [5]:
print("Length of association results: ", len(association_results))

Length of association results:  1388


In [6]:
print(association_results[0])

RelationRecord(items=frozenset({'Beirut', '13 Hours'}), support=0.02, ordered_statistics=[OrderedStatistic(items_base=frozenset({'13 Hours'}), items_add=frozenset({'Beirut'}), confidence=0.7999999999999999, lift=6.956521739130434)])


### Another basic algorithm one can explore
- Decision trees

### Part 3: Using your powers for good
#### Go do something great, good and impactful

![House Party](./images/ironMan-hpProtocol.jpg)

## But before we move forward, a slight detour

## A look at ORM

### What is ORM (or Object Relational Mapping)

- At an highly abstract level, its just converting data between incompatible type systems
- Used with object oriented languages

Here's a diagrammatic view: 
![orm](./images/orm.png)

### So what kind of a programming technique is ORM?

- You don't have to worry about SQL queries as such
- Within the comforts of your object oriented language you can map tables in your database into objects in your OOP language
- You can manipulate those objects as you wish and the ORM framework will take care of persisting those new states into the respective tables
- An even better way to design your project would be to totally separate your business logic from the database handling using the Data Access Object (DAO) design pattern

## Demo on ORM

# That's it about ORMs, now back to Part 3 :) 

Now that you are well versed about the field of Data Science once you have mastered the techniques and are well versed in it, you should know where you exercise your capabilities

How is data science helping the world?
- Uber Eats uses it to optimize delivery routes thus saving on fuel consumption
- In the Sports domain, teams use it to identify players and tactics that, statistically, show that they can deliver better results(Oakland Athletics which inspired the movie Moneyball)
- E-commerce sites and their advertisements 
- Helping Facebook better connect you to people  you may know
- Helping Tinder do the same by matching you to people who share your preferences (wait? dating is mathematics?)

How can you get started?
- Kaggle! -> https://www.kaggle.com/
- Join the Facebook Developer Circles:Data community, connect with other folks who dabble in data science [FDC: Data] (https://www.facebook.com/groups/138761710178602)
- A good practice would be to search for datatsets online and try to extract insights from them (r/dataisbeautiful -> https://www.reddit.com/r/dataisbeautiful/)
- Keep exploring :)

References:
    - Data Mining-Concepts and Techniques: Han and Kamber 
    - SQL Alchemy tutorial by Vinay Kudari: (https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91)
    - SQL Alchemy object persistence tutorial: (https://www.tutorialspoint.com/python_data_persistence/python_data_persistence_sqlalchemy.html)

That'll be all folks!
Thank you for joining in.

Connect with me:
- [LinkedIn - Abdul Wasay] (https://www.linkedin.com/in/abdul-wasay-4915a3110/)
- [GitHub - KnightTuring] (https://github.com/KnightTuring)
- [Email - abdulwasay50@gmail.com]
