
# Introduction to Data Analytics Libraries

Data analytics involves analyzing raw data to find trends and answer questions. The Python ecosystem has robust libraries for data analytics; Numpy, Pandas, and Seaborn are foundational tools that enable efficient data analysis and visualization.

## Numpy
Numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

## Pandas
Pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

## Seaborn
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.



## Getting Started with Numpy

Numpy is the core library for numerical computations in Python. It provides a high-performance multidimensional array object and tools for working with these arrays.


### Basic Numpy Arrays
```python
import numpy as np

a = np.array([1, 2, 3])
print(a)
```

In [None]:
#import statement

In [None]:
# for new installation you us
#pip install numpy

In [55]:
import numpy

In [24]:
#np as the alias because of convention
import numpy as np

In [19]:
# single dimension array
arr1 = np.array([1,3,5,7,4,6])

In [20]:
type(arr1)

numpy.ndarray

In [21]:
arr1

array([1, 3, 5, 7, 4, 6])

In [22]:
#2d array
np.array([[1,3,5,7],[2,5,7,8]])

array([[1, 3, 5, 7],
       [2, 5, 7, 8]])

In [23]:
#2d array
np.array([[1,3,5,7],[2,5,7,8]], dtype=float)

array([[1., 3., 5., 7.],
       [2., 5., 7., 8.]])

In [32]:
dim = np.array([[1,3,5,7],[2,5,7,8]], dtype=float)

In [34]:
# shape or dimension
dim.shape

# 2 rows, 4 columns

(2, 4)

### Operations with Numpy Arrays
Numpy arrays support various operations, which are performed element-wise.
```python
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)

# Element-wise multiplication
print(a * b)
```

In [25]:
arr1

array([1, 3, 5, 7, 4, 6])

In [26]:
list2 = [3, 5, 6, 8, 10, 15]

In [27]:
arr2 = np.array(list2)

In [28]:
arr2

array([ 3,  5,  6,  8, 10, 15])

In [29]:
arr1 + arr2

array([ 4,  8, 11, 15, 14, 21])

In [30]:
# no element-wise with different shape

arr3 = np.array([2,4,6])

In [31]:
arr1 + arr3

ValueError: operands could not be broadcast together with shapes (6,) (3,) 

In [35]:
#broadcasting

In [36]:
list2

[3, 5, 6, 8, 10, 15]

In [39]:
for i in list2:
    print(i + 5)

8
10
11
13
15
20


In [40]:
arr1

array([1, 3, 5, 7, 4, 6])

In [44]:
#broadcasting
arr1 * 5

array([ 5, 15, 25, 35, 20, 30])

### Working with Mathematical Functions
Numpy provides a vast array of mathematical functions to perform operations on arrays.

In [None]:
square root
mean
max
min
avg
ceil
log


In [46]:
arr1

array([1, 3, 5, 7, 4, 6])

In [45]:
np.max(arr1)

7

In [47]:
np.min(arr1)

1

In [48]:
np.average(arr1)

4.333333333333333

In [49]:
np.sum(arr1)

26

In [50]:
#this is done element-wise
np.sqrt(arr1)

array([1.        , 1.73205081, 2.23606798, 2.64575131, 2.        ,
       2.44948974])

In [51]:
np.square(arr1)

array([ 1,  9, 25, 49, 16, 36], dtype=int32)

In [53]:
np.exp(arr1)

array([   2.71828183,   20.08553692,  148.4131591 , 1096.63315843,
         54.59815003,  403.42879349])

In [54]:
np.log(arr2)

array([1.09861229, 1.60943791, 1.79175947, 2.07944154, 2.30258509,
       2.7080502 ])

In [None]:
# Mathematical functions
np.sqrt(arr)
np.log(arr)


## Exploring Data with Pandas

Pandas is a library offering high-performance, easy-to-use data structures, and data analysis tools. The DataFrame is one of Pandas' most important classes.


### Creating a DataFrame
```python
import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 34, 29, 32]}
df = pd.DataFrame(data)
print(df)
```

In [59]:
import pandas as pd

In [None]:
#you create a dataframe using a dictionary

In [62]:
data = {'Name': ['Lekan', 'Olaoluwa', 'Peter', 'Suliman'], 'Age': [20,35, 70, 90], 'Income':[5000, 7000, 8000, 6500]}

In [70]:
data

{'Name': ['Lekan', 'Olaoluwa', 'Peter', 'Suliman'],
 'Age': [20, 35, 70, 90],
 'Income': [5000, 7000, 8000, 6500]}

In [63]:
type(data)

dict

In [64]:
pd.DataFrame(data)

Unnamed: 0,Name,Age,Income
0,Lekan,20,5000
1,Olaoluwa,35,7000
2,Peter,70,8000
3,Suliman,90,6500


In [65]:
#storing to a variable
df = pd.DataFrame(data)

In [66]:
df

Unnamed: 0,Name,Age,Income
0,Lekan,20,5000
1,Olaoluwa,35,7000
2,Peter,70,8000
3,Suliman,90,6500



### Reading Data from Files
Pandas can easily read data stored in different file formats like CSV, Excel, or JSON.
```python
df = pd.read_csv('filename.csv')
```

In [None]:
pd.read

In [71]:
df = pd.read_csv('Supermarket.csv')

In [77]:
df.head()

Unnamed: 0,Product Identifier,Supermarket Identifier,Product Supermarket Identifer,Product Weight,Product Fat Content,Product Shelf Visibility,Product Type,Product Price,Supermarket Opening Year,Supermarket Size,Supermarket Location Type,Supermarket Type,Average Price per ProductType,Product Supermarket Sales
0,NCA29,CHUKWUDI046,NCA29_CHUKWUDI046,10.5,Lo Fat,0.027276,Household,428.28,2004,Small,Cluster 1,Supermarket Type1,399.999418,8983.31
1,FDG53,CHUKWUDI049,FDG53_CHUKWUDI049,10.0,Low Fat,0.045928,Frozen Foods,345.3,2006,Medium,Cluster 1,Supermarket Type1,388.0710941,4893.63
2,NCN05,CHUKWUDI045,NCN05_CHUKWUDI045,8.235,Lo Fat,0.014489,Health and Hygiene,459.49,2009,,Cluster 2,Supermarket Type1,367.1430293,7323.8
3,NCV17,CHUKWUDI046,NCV17_CHUKWUDI046,18.85,Low Fat,0.016108,Health and Hygiene,324.41,2004,Small,Cluster 1,Supermarket Type1,367.1430293,7541.85
4,FDK03,CHUKWUDI045,FDK03_CHUKWUDI045,12.6,Normal Fat,0.07407,Dairy,635.59,2009,,Cluster 2,Supermarket Type1,409.5043429,11445.1


In [78]:
df.tail()

Unnamed: 0,Product Identifier,Supermarket Identifier,Product Supermarket Identifer,Product Weight,Product Fat Content,Product Shelf Visibility,Product Type,Product Price,Supermarket Opening Year,Supermarket Size,Supermarket Location Type,Supermarket Type,Average Price per ProductType,Product Supermarket Sales
2989,FDK21,CHUKWUDI045,FDK21_CHUKWUDI045,7.905,Low Fat,0.010033,Snack Foods,620.85,2009,,Cluster 2,Supermarket Type1,395.35219,1877.56
2990,NCJ05,CHUKWUDI049,NCJ05_CHUKWUDI049,18.7,Low Fat,0.04616,Health and Hygiene,380.92,2006,Medium,Cluster 1,Supermarket Type1,367.1430293,4192.88
2991,FDI04,CHUKWUDI019,FDI04_CHUKWUDI019,,Normal Fat,0.12766,Frozen Foods,496.36,1992,small,Cluster 1,Grocery Store,388.0710941,1977.43
2992,DRG37,CHUKWUDI013,DRG37_CHUKWUDI013,16.2,Low Fat,0.019362,Soft Drinks,386.74,1994,High,Cluster 3,Supermarket Type1,386.9982375,4284.42
2993,FDE51,CHUKWUDI013,FDE51_CHUKWUDI013,5.925,Normal Fat,0.096387,Dairy,114.02,1994,High,Cluster 3,Supermarket Type1,409.5043429,892.17


In [79]:
df.shape

(2994, 14)

In [87]:
#slicing
df[2:15]

Unnamed: 0,Product Identifier,Supermarket Identifier,Product Supermarket Identifer,Product Weight,Product Fat Content,Product Shelf Visibility,Product Type,Product Price,Supermarket Opening Year,Supermarket Size,Supermarket Location Type,Supermarket Type,Average Price per ProductType,Product Supermarket Sales
2,NCN05,CHUKWUDI045,NCN05_CHUKWUDI045,8.235,Lo Fat,0.014489,Health and Hygiene,459.49,2009,,Cluster 2,Supermarket Type1,367.1430293,7323.8
3,NCV17,CHUKWUDI046,NCV17_CHUKWUDI046,18.85,Low Fat,0.016108,Health and Hygiene,324.41,2004,Small,Cluster 1,Supermarket Type1,367.1430293,7541.85
4,FDK03,CHUKWUDI045,FDK03_CHUKWUDI045,12.6,Normal Fat,0.07407,Dairy,635.59,2009,,Cluster 2,Supermarket Type1,409.5043429,11445.1
5,FDV02,CHUKWUDI027,FDV02_CHUKWUDI027,,Low Fat,0.060252,Dairy,426.78,1992,Medium,Cluster 3,Supermarket Type3,409.5043429,7699.98
6,FDF17,CHUKWUDI017,FDF17_CHUKWUDI017,5.19,Low Fat,0.042862,Frozen Foods,492.03,2014,,Cluster 2,Supermarket Type1,388.0710941,9329.52
7,FDA26,CHUKWUDI019,FDA26_CHUKWUDI019,,Norml Fat,0.129425,Dairy,548.37,1992,Small,Cluster 1,Grocery Store,409.5043429,547.62
8,FDZ09,CHUKWUDI049,FDZ09_CHUKWUDI049,17.6,Low Fat,0.105042,Snack Foods,409.72,s006,Medium,Cluster 1,Supermarket Type1,395.35219,7779.87
9,FDL08,CHUKWUDI027,FDL08_CHUKWUDI027,,Lo Fat,0.049478,Fruits and Veg,613.54,1992,Medium,Cluster 3,Supermarket Type3,398.2481124,9188.04
10,FDW20,CHUKWUDI010,FDW20_CHUKWUDI010,20.75,Low Fat,0.040421,Fruits and Vegetables,305.43,2005,,Cluster 3,Grocery Store,398.2481124,923.8
11,FDX44,CHUKWUDI013,FDX44_CHUKWUDI013,9.3,Low Fat,0.042931,Fruits and Vegetables,221.29,1994,High,Cluster 3,Supermarket Type1,398.2481124,3568.69
