### Core libraries in Data Science
Python has numerous tools already built by others, that allow us to leverage their functions within our notebooks, namely **libraries.**

For example if you want to plot a bar graph in your notebook, then there is no need of writing a whole function to do it, you just import already prepared functions from either `matplotlib` or `seaborn` to do it for you.

In [1]:
import seaborn as sns

Now in data science there are core libraries that we import at the start of our notebooks. We'll list them down according to categories:

#### Scientific computing
1. `Numpy` - short for numerical python. It helps with math operations, especially involving arrays and matrices — which are used all the time in data science and machine learning.

2. `Statsmodels` - this one is primarily for statistical modelling and analysis. It's great for exploring data, running regressions, and getting in-depth statistical summaries.

3. `Pandas` - ideal for table-like data eg spreadsheets or SQL tables. Its all about data manipulation, cleaning and exploration.

#### Visualizations within notebooks
1. `Matplotlib` - is the standard library for creating plots in Python. It works well with NumPy and Pandas. It’s super customizable, but it can be a bit low-level — meaning more code is needed for complex charts.

2. `Seaborn` - builds on Matplotlib and makes beautiful, statistical plots with way less effort. If Matplotlib makes the hard stuff possible, Seaborn makes it easy.


#### Machine Learning
1. `Scikit-Learn` - is the go-to library for machine learning in Python. It wraps up a bunch of ML algorithms and tools into a simple, consistent interface. It’s fast, well-documented, and widely used in both research and industry.

2. `Tensorflow` -  Built by Google, TensorFlow is a powerful library for training deep neural networks. It’s designed for speed and performance, especially on big data.
It uses data flow graphs to do complex computations and can scale to run across CPUs, GPUs, or even whole clusters.

3. `Keras` - is a high-level wrapper around TensorFlow (and previously Theano). It’s built to be user-friendly, modular, and easy to extend. It’s a great way to get started with deep learning without getting too deep into the weeds.

The last two in the machine learning category are primarily for deep neural networks.



In [3]:
import pandas as pd


#loading dataset 
data = pd.read_csv('Salary_dataset.csv')

data.head()

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
0,0,1.2,39344.0
1,1,1.4,46206.0
2,2,1.6,37732.0
3,3,2.1,43526.0
4,4,2.3,39892.0


In [4]:
data.head(10)

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
0,0,1.2,39344.0
1,1,1.4,46206.0
2,2,1.6,37732.0
3,3,2.1,43526.0
4,4,2.3,39892.0
5,5,3.0,56643.0
6,6,3.1,60151.0
7,7,3.3,54446.0
8,8,3.3,64446.0
9,9,3.8,57190.0


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Unnamed: 0       30 non-null     int64  
 1   YearsExperience  30 non-null     float64
 2   Salary           30 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 852.0 bytes


In [6]:
data.tail()

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
25,25,9.1,105583.0
26,26,9.6,116970.0
27,27,9.7,112636.0
28,28,10.4,122392.0
29,29,10.6,121873.0


In [7]:
data.tail(10)

Unnamed: 0.1,Unnamed: 0,YearsExperience,Salary
20,20,6.9,91739.0
21,21,7.2,98274.0
22,22,8.0,101303.0
23,23,8.3,113813.0
24,24,8.8,109432.0
25,25,9.1,105583.0
26,26,9.6,116970.0
27,27,9.7,112636.0
28,28,10.4,122392.0
29,29,10.6,121873.0
