<a href="https://colab.research.google.com/github/MonkeyWrenchGang/PythonBootcamp/blob/main/day_4/4_2_Counting_Nulls.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Review Counting Null Values


---


## `isna().sum(axis=0)` and `axis=1`

In data analysis, it's important to identify missing or null values within a dataset. Null values can significantly impact the accuracy of analyses and need to be handled appropriately. In this notebook, we will review how to count null values using the `isna().sum(axis=0)` and `axis=1` methods in pandas.

## Understanding Null Values
- Null values, also known as missing values, represent the absence of data in a specific cell or column of a dataset.
- They can occur due to various reasons, such as incomplete data entry, data corruption, or data extraction issues.
- Identifying and handling null values is a key operation in data cleaning, preprocessing, and accurate analysis.

## The `isna()` Function
- The `isna()` function is a pandas method that returns a Boolean of the DataFrame, marking True for cells with null values and False for non-null values.
- By itself it isn't too useful

## **column-wise** using `sum(axis=0)`
- `isna().sum(axis=0)`, calculates the sum of True values (null values) vertically, **column-wise**.
- This provides the count of null values present in each column of the DataFrame.

## **row-wise** using `sum(axis=1)`
- `isna().sum(axis=1)` calculates the sum of True values (null values) horizontally, **row-wise**.
- When used with `axis=1`, it provides the count of null values present in each row of the DataFrame.
- Typically we add a column to the dataframe to filter rows that contain a large number of nulls.


```python
import pandas as pd

# Create a sample DataFrame with null values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [None, 10, 11, None],
    'D': [12, 13, 14, 15]
}
df = pd.DataFrame(data)

# Count null values column-wise (axis=0)
null_counts_column = df.isna().sum(axis=0)

# Count null values row-wise (axis=1)
df["null_count"] = df.isna().sum(axis=1)

print("Null value counts column-wise:")
print(null_counts_column)
print()

print("Null value counts row-wise:")
print(df)

```



# Let's practice


---



1. import pandas as pd
2. import the following CSV into a dataframe called `abnb`
```
"https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_3/data/sd_listings.csv"
```

  - check it out using head()
  - use info() to check the nulls and data types
3. use .isna().sum(axis=0) to get a summary count of nulls by column
4. create a new column called `row_null_count` using .isna().sum(axis=1)
  - use head() to makesure all is well.
5. use query to filter rows with a row_null_count >= 5

## Sample Solution


---



In [14]:
import pandas as pd

abnb = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_3/data/sd_listings.csv")
abnb.head()
abnb.info()
abnb.isna().sum()
abnb["row_null_count"] = abnb.isna().sum(axis=1)
abnb.query('row_null_count >= 5')

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license,row_null_count
1591,14116320,Modern La Jolla Escape,66115585,,,La Jolla,32.8322,-117.25315,Entire home/apt,1647,1,0,,,3,0,0,,5
1594,14116330,Villa Portofino,66115585,,,La Jolla,32.84609,-117.26016,Entire home/apt,3049,1,0,,,3,0,0,,5
1596,14116332,Forever Views,66115585,,,La Jolla,32.82401,-117.24479,Entire home/apt,1708,1,0,,,3,0,0,,5
