**Comparisons and Masking in NumPy:**

1. Element-wise Comparison:
   - You can perform element-wise comparisons between NumPy arrays.
   - Example:
     ```python
     a = np.array([1, 3, 4])
     b = np.array([2, 2, 7])
     c = a < b
     ```
     Result: `[True, False, True]`

2. Checking All and Any:
   - You can check if all or any comparisons are True using `all()` and `any()` functions.
   - Example:
     ```python
     print(c.all())  # Check if all comparisons are True (False in this case)
     print(c.any())  # Check if any comparison is True (True in this case)
     ```

3. Counting True Comparisons:
   - You can count the number of True comparisons using `np.sum()` on the boolean array.
   - Example:
     ```python
     print(np.sum(c))  # Count the number of True comparisons (2 in this case)
     ```

4. Broadcasting Rules:
   - Broadcasting rules apply to comparison operations, allowing you to compare arrays of different shapes.
   - Example:
     ```python
     print(a > 0)  # Compare array 'a' with 0 element-wise
     ```

**Using Real Data (Weather Data):**

1. Loading Weather Data:
   - You can use Pandas to load real data, such as daily average temperatures from a CSV file.
   - Example:
     ```python
     import pandas as pd
     a = pd.read_csv("https://raw.githubusercontent.com/csmastersUH/data_analysis_with_python_2020/master/kumpula-weather-2017.csv")['Air temperature (degC)'].values
     ```

2. Counting Days with Temperatures Below Zero:
   - You can count the number of days with temperatures below zero using NumPy.
   - Example:
     ```python
     print("Number of days with the temperature below zero", np.sum(a < 0))
     ```

**Boolean Operations and Masking:**

1. Combining Boolean Values:
   - You can combine boolean values using `and`, `or`, and `not` for core Python.
   - For boolean arrays, use element-wise operators `&`, `|`, and `~`.
   - Example:
     ```python
     np.sum((0 < a) & (a < 10))  # Count temperatures between 0 and 10
     ```

2. Masking:
   - Boolean arrays can be used to select a subset of elements.
   - Example:
     ```python
     c = a > 0
     print(c[:10])  # Print the first ten elements of the boolean array
     print(a[c])    # Select only the positive temperatures from 'a'
     ```

3. Masking to Assign New Values:
   - You can use masking to assign new values to specific elements in an array.
   - Example:
     ```python
     a[~c] = 0  # Zero out the negative temperatures in 'a'
     ```

Make sure to compare the modified array with the original array to understand how masking works.

In [13]:
"""
Exercise 3.1 (column comparison)
Write function column_comparison that gets a two dimensional array as parameter. 
The function should return a new array containing those rows from the input that have 
the value in the second column larger than in the second last column. 
You may assume that the input contains at least two columns. 

Don't use loops, but instead vectorized operations. 
Try out your function in the main function.

For array

 [[8 9 3 8 8]
 [0 5 3 9 9]
 [5 7 6 0 4]
 [7 8 1 6 2]
 [2 1 3 5 8]]
the result would be

 [[8 9 3 8 8]
 [5 7 6 0 4]
 [7 8 1 6 2]]
"""

import numpy as np

def column_comparison(a):
    """
    The function should return a new array containing those rows 
    from the input that have the value in the second column larger than
    in the second last column.
    """
    second_column = a[:,1]
    second_last_column = a[:,-2]

    larger_than_bool = (second_column > second_last_column)
    larger_than_rows = a[larger_than_bool]
   
    return larger_than_rows
"""
def column_comparison(a):
    mask = a[:,1] > a[:,-2]
    return a[mask]
"""
def main():
    matrix = np.array([[8, 9, 3, 8, 8],
                  [0, 5, 3, 9, 9],
                  [5, 7, 6, 0, 4],
                  [7, 8, 1, 6, 2],
                  [2, 1, 3, 5, 8]])
    result=column_comparison(matrix)
    print(result)

main()

[[8 9 3 8 8]
 [5 7 6 0 4]
 [7 8 1 6 2]]


In [19]:
"""
Exercise 3.2 (first half second half)
Write function first_half_second_half that gets a two dimensional array of shape (n,2*m) as a parameter. 
The input array has 2*m columns. 

The output from the function should be a matrix with those rows from the input 
that have the sum of the first m elements larger than the sum of the last m elements on the row. 
Your solution should call the np.sum function or the corresponding method exactly twice.

Example of usage:

a = np.array([[1, 3, 4, 2],
              [2, 2, 1, 2]])
first_half_second_half(a)
array([[2, 2, 1, 2]])
"""


import numpy as np

def first_half_second_half(a):
    n, m = a.shape
    mid = m//2
    first_half = np.sum(a[:, :mid], axis = 1)
    last_half = np.sum(a[:, mid:], axis = 1)

    first_half_larger_row = a[(first_half > last_half)]
    return (first_half_larger_row)
"""
def first_half_second_half(a):
    a1, a2 = np.split(a, 2, axis=1)
    mask = np.sum(a1, axis=1) > np.sum(a2, axis=1)
    return a[mask]
"""
def main():
    a = np.array([[1, 3, 4, 2],
              [2, 2, 1, 2]])
    first_half_second_half(a)
main()

[[2 2 1 2]]


**Fancy Indexing:**
- Ordinary indexing retrieves a single element from an array.
- For multiple non-contiguous elements, you usually need to use multiple index operations.
- Example:
  ```python
  np.random.seed(0)
  a = np.random.randint(0, 20, 20)
  a2 = np.array([a[2], a[5], a[7]])
  print(a2)  # Output: [12 15 0]
  ```

- Fancy indexing provides a more concise way to access multiple elements.
- Create a list of indices or use them directly to index the array.
- Example:
  ```python
  idx = [2, 5, 7]
  print(a[idx])  # Output: [0 7 19]
  ```

- You can also use fancy indexing to assign values to multiple elements.
- Example:
  ```python
  a[idx] = -1
  print(a)  # Output: [12 15 -1 3 3 -1 9 -1 18 4 6 12 1 6 7 14 17 5 13 8]
  ```

- Fancy indexing works with higher-dimensional arrays as well.
- Example:
  ```python
  b = np.arange(16).reshape(4, 4)
  row = np.array([0, 2])
  col = np.array([1, 3])
  print(b[row, col])  # Output: [1 11]
  ```

- The shape of the result array is determined by the shape of the index arrays, not the original array.

- Broadcasting rules can help avoid repetition when using fancy indexing with differently shaped index arrays.

**Sorting Arrays:**
- `np.sort()` function sorts elements of an array in ascending order and returns a new array without modifying the original.
- Example:
  ```python
  a = np.array([2, 1, 4, 3, 5])
  sorted_a = np.sort(a)
  print(sorted_a)  # Output: [1 2 3 4 5]
  print(a)  # Output: [2 1 4 3 5]
  ```

- To sort the original array in-place, you can use the `.sort()` method of the array.
- Example:
  ```python
  a.sort()
  print(a)  # Output: [1 2 3 4 5]
  ```

- Sorting can also be done along specific axes for multi-dimensional arrays using `axis` parameter.
- Example:
  ```python
  b = np.random.randint(0, 10, (4, 4))
  sorted_columns = np.sort(b, axis=0)
  sorted_rows = np.sort(b, axis=1)
  ```

- `argsort()` returns the indices of the sorted elements without modifying the original array.
- Example:
  ```python
  a = np.array([23, 12, 47, 35, 59])
  sorted_indices = np.argsort(a)
  ```

- You can use the sorted indices for fancy indexing to retrieve the sorted elements.
- Example:
  ```python
  sorted_elements = a[sorted_indices]
  ```

These concepts should help you understand and work with fancy indexing and sorting in NumPy.

In [53]:
"""
Exercise 3.3 (most frequent first)
Note: 
This exercise is fairly difficult. 
Feel free to skip if you get stuck.
Write function most_frequent_first that gets a two dimensional array and an index c of a column as parameters.

The function should then return the array whose rows are sorted based on column c, in the following way. 

Rows are ordered so that those rows with the most frequent element in column c come first, 

then come the rows with the second most frequent element in column c, and so on.

Therefore, the values outside column c don't affect the ordering in any way.

Example of usage:

a:
 [[5 0 3 3 7 9 3 5 2 4]
 [7 6 8 8 1 6 7 7 8 1]
 [5 9 8 9 4 3 0 3 5 0]
 [2 3 8 1 3 3 3 7 0 1]
 [9 9 0 4 7 3 2 7 2 0]
 [0 4 5 5 6 8 4 1 4 9]
 [8 1 1 7 9 9 3 6 7 2]
 [0 3 5 9 4 4 6 4 4 3]
 [4 4 8 4 3 7 5 5 0 1]
 [5 9 3 0 5 0 1 2 4 2]]
print(most_frequent_first(a, -1))
 [[4 4 8 4 3 7 5 5 0 1]
 [2 3 8 1 3 3 3 7 0 1]
 [7 6 8 8 1 6 7 7 8 1]
 [5 9 3 0 5 0 1 2 4 2]
 [8 1 1 7 9 9 3 6 7 2]
 [9 9 0 4 7 3 2 7 2 0]
 [5 9 8 9 4 3 0 3 5 0]
 [0 3 5 9 4 4 6 4 4 3]
 [0 4 5 5 6 8 4 1 4 9]
 [5 0 3 3 7 9 3 5 2 4]]

If we look at the last column, we see that the number 1 appears three times, 
then both numbers 2 and 0 appear twice, and lastly numbers 3, 9, and 4 appear only once. 
Note that, for example, among those rows that contain in column c a number that appear twice in column c the order can be arbitrary.

Hint: the function np.unique may be useful.
"""

import numpy as np

def most_frequent_first(a, c):
    base_column = a[:,c]
    # sort rows base on most frequent true value
    unique_values, count_number = np.unique(base_column, return_counts=True)
    
    # map the frequency of number to each unique values
    frequency = dict(zip(unique_values, count_number))
    
    # map base_column with frequency table in descending order 
    base_column_frequency = [-frequency[value] for value in base_column]

    # extract sorted indecies from the base_column_frequency
    sorted_frequency_indecies = np.argsort(base_column_frequency)
    
    # use that sorted indecies to sort the rows of original matrix -> a
    return a[sorted_frequency_indecies]
 
"""
def most_frequent_first(a, c):
    b = a[:,c]   # get column c
    # return_reverse -> If True, also return the indices of the unique array (for the specified axis, if provided) that can be used to reconstruct arr.
    # return_counts -> If True, also return the number of times each unique item appears in ar.
    _,s,t = np.unique(b, return_inverse=True, return_counts=True)
    # t[s] -> give you the original table array a version of frequency
    # np.argsort -> sort that frequency table and return indecies that can be use for sorting original matrix
    idx = np.argsort(t[s])

    # once we use that indecies we reverse the rows in the opposite to get descending order table
    return a[idx][::-1]
"""


def main():
    data = [
        [5, 0, 3, 3, 7, 9, 3, 5, 2, 4],
        [7, 6, 8, 8, 1, 6, 7, 7, 8, 1],
        [5, 9, 8, 9, 4, 3, 0, 3, 5, 0],
        [2, 3, 8, 1, 3, 3, 3, 7, 0, 1],
        [9, 9, 0, 4, 7, 3, 2, 7, 2, 0],
        [0, 4, 5, 5, 6, 8, 4, 1, 4, 9],
        [8, 1, 1, 7, 9, 9, 3, 6, 7, 2],
        [0, 3, 5, 9, 4, 4, 6, 4, 4, 3],
        [4, 4, 8, 4, 3, 7, 5, 5, 0, 1],
        [5, 9, 3, 0, 5, 0, 1, 2, 4, 2]
    ]

    numpy_array = np.array(data)
    sorted_matrix = most_frequent_first(numpy_array, -1)
    print(sorted_matrix)
main()

[1 3 2 3 2 1 2 1 3 2]
[[4 4 8 4 3 7 5 5 0 1]
 [2 3 8 1 3 3 3 7 0 1]
 [7 6 8 8 1 6 7 7 8 1]
 [5 9 3 0 5 0 1 2 4 2]
 [8 1 1 7 9 9 3 6 7 2]
 [9 9 0 4 7 3 2 7 2 0]
 [5 9 8 9 4 3 0 3 5 0]
 [0 3 5 9 4 4 6 4 4 3]
 [0 4 5 5 6 8 4 1 4 9]
 [5 0 3 3 7 9 3 5 2 4]]
