# NumPy


NumPy is a popular Python library for scientific computing and numerical operations. It provides powerful tools for working with multi-dimensional arrays and matrices, which are essential for many data science and machine learning applications. NumPy also offers a wide range of mathematical functions and algorithms that can efficiently process large amounts of data. With NumPy, users can easily perform operations such as matrix multiplication, element-wise calculations, and statistical analysis, all while taking advantage of its fast and optimized code.

NumPy is widely used in the data science community and is a foundational library for many other Python data analysis tools. Its ability to work with large datasets efficiently and effectively make it a valuable tool for both data exploration and model building. NumPy also integrates well with other popular Python libraries such as Pandas, Matplotlib, and SciPy, making it an essential part of any data scientist's toolkit. Whether you're a beginner or an advanced user, NumPy can help you perform complex calculations with ease and speed up your data analysis workflow.



Numpy documentation: https://numpy.org/doc/stable/


Numpy is the core library for scientific computing in Python. Foundational Python libraries such as Pandas,  SciPy and matplotlip are built on top of Numpy. 

In [1]:
import numpy as np

### Arrays vs. Python Lists
- Python lists can include different data types whereas, all the elements in a NumPy array must be the same data type. This makes NumPy very efficient: there's no need for NumPy to check the data type of each element in an array since they must all be the same. Having only a single data type also means that a NumPy array takes up less space in memory than the same information would if stored as a Python list.

In [2]:
num1 = [4, 5, 6, 7]
num2 = [7, 8, 9, 10]

In [3]:
sum_list = num1 + num2
sum_list

[4, 5, 6, 7, 7, 8, 9, 10]

In [4]:
a1 = np.array(num1)
a2 = np.array(num2)

a1 + a2

array([11, 13, 15, 17])

In [5]:
np.array([num1, num2])

array([[ 4,  5,  6,  7],
       [ 7,  8,  9, 10]])

### Class Task
- Create an array to store monthly sales for 3 different products over a 12 month period
- Create a 2D array which contains total sales for each month.
- Concatenate total_sales with monthly_sales into a new array called monthly_sales_with_total
- Create a 1D array called avg_monthly_sales, which contains the average sales amount for each month.


### Creating arrays using built-in functions
- np.zeros()
- np.ones()
- np.arange()
- np.random - Rand, Randn, Randint

## Pandas

Pandas is a powerful and widely used Python library for data manipulation and analysis. It provides tools for working with structured data, such as tabular data in the form of tables or spreadsheets, and time series data. Pandas allows users to read in data from various file formats, such as CSV or Excel files, and manipulate it in many ways, including filtering, sorting, grouping, and aggregating data. Additionally, Pandas provides powerful tools for data cleaning, handling missing data, and data visualization.

Pandas is built on top of NumPy and is dependent on it. NumPy provides the underlying data structure for Pandas to work with, specifically, the ndarray (N-dimensional array), which is a powerful data structure for performing fast and efficient numerical computations.

Pandas is a must-have tool for any data scientist or analyst working with data in Python. It provides a user-friendly interface for working with complex data structures and makes data analysis more accessible and efficient. With its powerful data manipulation and transformation capabilities, Pandas has become a popular tool in many industries, including finance, healthcare, and retail. Whether you're analyzing large datasets or working with smaller, more structured data, Pandas provides a versatile set of tools to help you manipulate and analyze your data with ease.

Pandas documentation: https://pandas.pydata.org/docs/

#### Class Task:

Given two tables above from a school students' report, answer the following questions<br>
- Who is the oldest in the class?<br>
- Who was admitted last?<br>
- How many male students do we have in the record?<br>
- Age of students that are over 20 <br>
- Average age of female students<br>
- Get the Fullname of the oldest student?

### Inspecting a DataFrame

When you get a new DataFrame to work with, the first thing you need to do is explore it and see what it contains. There are several useful methods and attributes for this.

- head() - returns the first few rows (the “head” of the DataFrame).
- info() - shows information on each of the columns, such as the data type and number of missing values.
- shape - returns the number of rows and columns of the DataFrame.
- describe() - calculates a few summary statistics for each column.
- columns: An index of columns: the column names.
