## NumPy



* NumPy, which stands for Numerical Python, is foundational for numerical computing in Python.
* Designed for scientific computation and is used extensively for data analysis because of its ability to handle large, multi-dimensional arrays and matrices efficiently. 
* Serves as the basis for many other Python data science libraries, including Pandas, due to its speed and efficiency in numerical computations.
* A few special use cases for NumPy specifically for data analysis: 
    * Array operations
    * Linear algebra
    * Statistical functions
    * Random number generation

### Importance 

* Important for data analysis.
* Way to store data in a structured format, making it easier to organize, access, and manipulate data.
* Pandas library is built on top of NumPy.

For more information on NumPy check out the official documentation [here](https://numpy.org/doc/1.26/).

### Array Operations

#### Notes
* Creation: **`import numpy as np; np.array([1, 2, 3])`**
* Basic Operations:  `+`, `-`, `/`, `*` performed element-wise
* Slicing: **`array[1:3]`**
* Boolean Indexing: **`array[array > 0]`**

#### Examples

Let's start with creating a fictional array with the number of years of experience required for five different data science job listings.

In [3]:
import numpy as np

In [4]:
my_array = np.array([1, 2, 3, 4])

In [5]:
my_array.mean()

2.5

In [6]:
# Job titles
job_titles = np.array(['Data Analyst', 'Data Sceintist', 'Data Engineer', 'Machine Learning Engineer'])

# Base salaries
base_salaries = np.array([60000, 80000, 75000, 90000])

# Bonus rates
bonus_rates = np.array([.05, .1, .08, .12])

In [7]:
total_salary = base_salaries * (1 + bonus_rates)

total_salary

array([ 63000.,  88000.,  81000., 100800.])

In [8]:
np.mean(total_salary)

83200.0

In [9]:
# in some cases arrays won't have the same number of entries

# Job titles
job_titles = np.array(['Data Analyst', 'Data Sceintist', 'Data Engineer', 'Machine Learning Engineer', 'AI Engineer']) # we have an extra job title but we don't have salary for it

# Base salaries
base_salaries = np.array([60000, 80000, 75000, 90000, np.nan]) # nan = not a number

# Bonus rates
bonus_rates = np.array([.05, .1, .08, .12, np.nan])

In [10]:
total_salary = base_salaries * (1 + bonus_rates)

total_salary

array([ 63000.,  88000.,  81000., 100800.,     nan])

In [11]:
np.mean(total_salary)

nan

In [12]:
np.nanmean(total_salary)

83200.0

### Practice

1️⃣ Basics - NumPy 🔢 </br>
Problem Statement:</br>
Create a NumPy array representing the number of job applications received each day for a week (7 days). The array should be called applications and contain the following numbers: [10, 15, 7, 20, 25, 30, 5]. Print the array.

In [13]:
applications = np.array([10, 15, 7, 20, 25, 30, 5])

In [14]:
print(applications)

[10 15  7 20 25 30  5]


In [15]:
applications

array([10, 15,  7, 20, 25, 30,  5])

Create a NumPy array representing the number of job postings for different job titles called postings. The array contains [10, 15, 7, 20, 25, 30, 5]. Use slicing to get the number of postings for the first three job titles.

In [16]:
postings_list = np.array([10, 15, 7, 20, 25, 30, 5])
print(postings_list)

[10 15  7 20 25 30  5]


In [19]:
first_three_jobs = postings_list[0:3]

In [20]:
first_three_jobs

array([10, 15,  7])

Create a NumPy array representing the salaries called salaries offered for five different job positions. Calculate the highest and lowest salary using NumPy functions. The array salaries contains [70000, 85000, 60000, 95000, 80000].

In [21]:
salaries_list = np.array([70000, 85000, 60000, 95000, 80000])
max_salary = np.max(salaries_list)
min_salary = np.min(salaries_list)

print(salaries_list)
print(max_salary)
print(min_salary)

[70000 85000 60000 95000 80000]
95000
60000
