![numpy-logo](../img/numpy-logo.png)

### Lists Recap
- powerful
- collection of values
- Hold different types
- change, add, remove
- Need for data Sciencce ?
    - Mathematical Operations Over Collections
    - Speed

### Illustration of Analysis

In [1]:
weight = [40.5, 45.6, 55.8, 60.2]
height = [5, 5.3, 6, 5.8]

bim = weight / (height**2)

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [4]:
bmi = []
for i in range(len(height)):
    bmi.append(weight[i]/height[i]**2)

In [6]:
bmi

[1.62, 1.6233535065859737, 1.5499999999999998, 1.789536266349584]

### Solution: NUMPY
- Numeric Python
- Alternative to Python List: Numpy Array
- Calculations Over entire Array
- Easy & Fast
- Installation
    - In the terminal: `pip install numpy`

In [7]:
!pip install numpy



In [8]:
import numpy as np

In [13]:
weight_np = np.array([40.5, 45.6, 55.8, 60.2])
height_np = np.array([5, 5.3, 6, 5.8])

print(type(weight_np), type(height_np), sep='\n')

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [14]:
bmi = weight_np / (height_np**2)

In [15]:
bmi

array([1.62      , 1.62335351, 1.55      , 1.78953627])

### Let's  do Comparsion

In [16]:
import time
import numpy as np

# Set size of the data
N = 10_000_000

# Generate data
py_list = list(range(N))
np_array = np.arange(N)

# Timing list computation (squaring elements)
start_time = time.time()
py_result = [x**2 for x in py_list]
py_duration = time.time() - start_time
print(f"List comprehension time: {py_duration:.4f} seconds")

# Timing numpy computation (squaring elements)
start_time = time.time()
np_result = np_array ** 2
np_duration = time.time() - start_time
print(f"NumPy vectorized time: {np_duration:.4f} seconds")

List comprehension time: 0.8506 seconds
NumPy vectorized time: 0.0240 seconds


### Numpy: remarks
- variety of elements
- operators action on list vs arrays

In [19]:
bio_data_np = np.array([40.5, 15, "Qasim"])
bio_data = [40.5, 15, "Qasim"]

In [20]:
print(bio_data_np, bio_data, sep='\n')

['40.5' '15' 'Qasim']
[40.5, 15, 'Qasim']


In [23]:
np_arr = np.array([1,2,3])
np_arr * 2

array([2, 4, 6])

In [24]:
lst = [1,2,3]
lst * 2

[1, 2, 3, 1, 2, 3]

#### Different Types = Different Behaviour!

# 2D Numpy Arrays
- array.ndim() & type(array)

### Subsetting

### Numpy: Basic Statistics

- Data analysis
    - Get to know your data
    - Little data -> Simply Look at it
    - big data -> ??

- np.mean(), np.median(), np.mode(), np.corrcoef(), np.std(), np.sort() <br>
- np_city[:,0]

In [1]:
import numpy as np

np_city = np.array([[1.64, 71.78],[1.37, 63.35], [1.76, 55.09],[2.04, 74.85],[2.04, 68.72],[2.01, 73.57],
                   [2.64, 73.78],[3.37, 65.35], [4.76, 65.09],[5.04, 76.85],[6.04, 68.72],[7.01, 73.67],
                   [1.64, 71.78],[1.37, 63.35], [1.76, 55.09],[2.04, 74.85],[2.04, 68.72],[2.01, 73.57]])

- sum(), sort(), ...
- Enforce single data type: speed!

### Generate data
- Arguments for np.random.normal() --> np.column_stack(())
- distribution mean
- distribution standard deviation
- number of samples

### Data visualization ( MATPLOTLIB)
- Very important in Data Analysis
- Explore data
- Report insights

- `Source: GapMinder, Wealth and Health of Nation`


### Basic plots with Matplotlib

#### line plot

##### defining X & Y axis
year = [1950, 1970, 1990, 2010] <br>
pop = [2.519, 3.692, 5.263, 6.972]

#### Scatter plot

#### Histogram
- Explore dataset
- Get an idea about distribution
- `help()`
- [0,0.6,1.4,1.6,2.2,2.5,2.6,3.2,3.5,3.9,4.2,6]
- `plt.hist(values, bins=3)`

### Data visualization
- Many options
    - Different plot types
    - Many Customization
- Choice Depend on
    - Data
    - story you want to tell

In [1]:
years = [1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037, 2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048, 2049, 2050, 2051, 2052, 2053, 2054, 2055, 2056, 2057, 2058, 2059, 2060, 2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2078, 2079, 2080, 2081, 2082, 2083, 2084, 2085, 2086, 2087, 2088, 2089, 2090, 2091, 2092, 2093, 2094, 2095, 2096, 2097, 2098, 2099, 2100]
pop = [2.53, 2.57, 2.62, 2.67, 2.71, 2.76, 2.81, 2.86, 2.92, 2.97, 3.03, 3.08, 3.14, 3.2, 3.26, 3.33, 3.4, 3.47, 3.54, 3.62, 3.69, 3.77, 3.84, 3.92, 4.0, 4.07, 4.15, 4.22, 4.3, 4.37, 4.45, 4.53, 4.61, 4.69, 4.78, 4.86, 4.95, 5.05, 5.14, 5.23, 5.32, 5.41, 5.49, 5.58, 5.66, 5.74, 5.82, 5.9, 5.98, 6.05, 6.13, 6.2, 6.28, 6.36, 6.44, 6.51, 6.59, 6.67, 6.75, 6.83, 6.92, 7.0, 7.08, 7.16, 7.24, 7.32, 7.4, 7.48, 7.56, 7.64, 7.72, 7.79, 7.87, 7.94, 8.01, 8.08, 8.15, 8.22, 8.29, 8.36, 8.42, 8.49, 8.56, 8.62, 8.68, 8.74, 8.8, 8.86, 8.92, 8.98, 9.04, 9.09, 9.15, 9.2, 9.26, 9.31, 9.36, 9.41, 9.46, 9.5, 9.55, 9.6, 9.64, 9.68, 9.73, 9.77, 9.81, 9.85, 9.88, 9.92, 9.96, 9.99, 10.03, 10.06, 10.09, 10.13, 10.16, 10.19, 10.22, 10.25, 10.28, 10.31, 10.33, 10.36, 10.38, 10.41, 10.43, 10.46, 10.48, 10.5, 10.52, 10.55, 10.57, 10.59, 10.61, 10.63, 10.65, 10.66, 10.68, 10.7, 10.72, 10.73, 10.75, 10.77, 10.78, 10.79, 10.81, 10.82, 10.83, 10.84, 10.85]

#### Axis labels

#axix label <br>
plt.xlabel('Year') <br>
plt.ylabel('Population') <br>

#### Title

#title
plt.title('World Population Projections')

#### Ticks

#ticks <br>
plt.yticks([0, 2, 4, 6, 8, 10]) <br>

#ticks <br>
plt.yticks([0, 2, 4, 6, 8, 10],['0','2B','4B','6B','8B','10B'])

#### Add historical data

#Add more data <br>
year = [1800, 1850, 1900] + years <br>
pop = [1.0, 1.262, 1.650] + pop <br>