#### **Difference between axis = 0 and axis = 1**
In the context of Pandas, `axis=0` and `axis=1` refer to the two main axes of a DataFrame or Series:

1. `axis=0` represents the vertical axis or rows.
2. `axis=1` represents the horizontal axis or columns.

Here's a more detailed explanation of the differences between `axis=0` and `axis=1`:

- `axis=0`:
  - When you perform an operation with `axis=0`, it operates along the rows.
  - For DataFrames, it means you are applying the operation vertically, row-wise. For example, when you use `df.sum(axis=0)`, you are summing the values column-wise.
  - For Series, it means you are aggregating or performing operations across the elements along the vertical direction.

- `axis=1`:
  - When you perform an operation with `axis=1`, it operates along the columns.
  - For DataFrames, it means you are applying the operation horizontally, column-wise. For example, when you use `df.sum(axis=1)`, you are summing the values row-wise.
  - For Series, it means you are aggregating or performing operations across the elements along the horizontal direction.

Here are a few common operations to illustrate the difference:

- `df.sum(axis=0)` would give you the sum of values for each column, resulting in a Series with column labels.
- `df.sum(axis=1)` would give you the sum of values for each row, resulting in a Series with row labels.

In summary, the choice between `axis=0` and `axis=1` depends on whether you want to perform operations vertically (along rows) or horizontally (along columns) in your DataFrame or Series.

### **Difference in inplace = True and inplace = False**
The `inplace` parameter in Pandas methods like `df.drop` and `df.dropna` determines whether the operation should modify the DataFrame in place (in the same variable) or return a new DataFrame with the operation applied. Here's the difference between `inplace=True` and `inplace=False`:

1. `inplace=True`:
   - When you set `inplace=True`, the operation is applied directly to the DataFrame in memory, and the DataFrame is modified.
   - It means the original DataFrame is altered, and there is no need to assign the result to a new variable.
   - This can be memory-efficient because it avoids creating a new DataFrame, but it should be used with caution as it can lead to unexpected data loss if not used carefully.

Example:

```python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Drop column 'B' in-place
df.drop('B', axis=1, inplace=True)

# df now no longer contains column 'B'
```

2. `inplace=False` (default behavior):
   - When you set `inplace=False` (or omit the `inplace` parameter, as it defaults to `False`), the operation returns a new DataFrame with the operation applied, leaving the original DataFrame unchanged.
   - You need to assign the result to a new variable if you want to keep the modified DataFrame.

Example:

```python
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Create a new DataFrame with column 'B' dropped, leaving df unchanged
new_df = df.drop('B', axis=1, inplace=False)

# df still contains column 'B', and new_df does not
```

In general, it's considered a good practice to use `inplace=False` (or omit it) because it helps avoid unintentional data modification and makes the code more explicit. If you want to modify the DataFrame in place, then you can set `inplace=True`.

### *Run below mentioned Commands in a code cell if these packages are not available in your environment*
* ! pip install pandas
* ! pip install matplotlib
* ! pip install os
* ! pip install datetime

### *Loading Libraries*

In [1]:
import pandas as pd
import os
import datetime

#### Import Sales Data

In [2]:
xlsx = pd.ExcelFile('Sales-Data.xlsx')
sheet_names = xlsx.sheet_names
sheet_names

['Customer-Base', 'Sales-Data']

In [20]:
sales_data = pd.read_excel('Sales-Data.xlsx', sheet_name= 'Sales-Data')
sales_data.head()

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue
0,2022-01-01,DYSMJ47747,WNYQN5037,85
1,2022-01-01,JDOBV42881,GJMJI5215,87
2,2022-01-01,IWATI93376,AWXSL7355,76
3,2022-01-01,BYIGR33509,TTPXB4921,56
4,2022-01-01,KEQKA35598,WQYYR7437,84


### Selecting Columns
https://www.kdnuggets.com/2019/06/select-rows-columns-pandas.html

### Select columns based on types

In [4]:
sales_data.select_dtypes('object')

Unnamed: 0,InvoiceID,CustomerID
0,DYSMJ47747,WNYQN5037
1,JDOBV42881,GJMJI5215
2,IWATI93376,AWXSL7355
3,BYIGR33509,TTPXB4921
4,KEQKA35598,WQYYR7437
...,...,...
74983,OCOXA30790,LZIEQ1426
74984,JHUDQ37932,HUNOH2192
74985,IMNAE85968,CVYTU7738
74986,JPDLA56525,QLRFI3440


In [5]:
sales_data.select_dtypes('int64')

Unnamed: 0,InvoiceValue
0,85
1,87
2,76
3,56
4,84
...,...
74983,522
74984,521
74985,475
74986,597


In [6]:
sales_data.select_dtypes('datetime64[ns]')

Unnamed: 0,InvoiceDate
0,2022-01-01
1,2022-01-01
2,2022-01-01
3,2022-01-01
4,2022-01-01
...,...
74983,2022-12-31
74984,2022-12-31
74985,2022-12-31
74986,2022-12-31


### Select columns with a list of variables


In [7]:
sales_data.columns

Index(['InvoiceDate', 'InvoiceID', 'CustomerID', 'InvoiceValue'], dtype='object')

In [9]:
# Raw way - I dont Like
sales_data[['InvoiceDate', 'InvoiceValue']]

Unnamed: 0,InvoiceDate,InvoiceValue
0,2022-01-01,85
1,2022-01-01,87
2,2022-01-01,76
3,2022-01-01,56
4,2022-01-01,84
...,...,...
74983,2022-12-31,522
74984,2022-12-31,521
74985,2022-12-31,475
74986,2022-12-31,597


In [10]:
# Structured way 
fields_to_select = ['InvoiceDate', 'InvoiceValue'] 
sales_data[fields_to_select]

Unnamed: 0,InvoiceDate,InvoiceValue
0,2022-01-01,85
1,2022-01-01,87
2,2022-01-01,76
3,2022-01-01,56
4,2022-01-01,84
...,...,...
74983,2022-12-31,522
74984,2022-12-31,521
74985,2022-12-31,475
74986,2022-12-31,597


### Delete columns 

In [16]:
### Single Variable
sales_data.drop('InvoiceValue', axis= 1)

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID
0,2022-01-01,DYSMJ47747,WNYQN5037
1,2022-01-01,JDOBV42881,GJMJI5215
2,2022-01-01,IWATI93376,AWXSL7355
3,2022-01-01,BYIGR33509,TTPXB4921
4,2022-01-01,KEQKA35598,WQYYR7437
...,...,...,...
74983,2022-12-31,OCOXA30790,LZIEQ1426
74984,2022-12-31,JHUDQ37932,HUNOH2192
74985,2022-12-31,IMNAE85968,CVYTU7738
74986,2022-12-31,JPDLA56525,QLRFI3440


In [17]:
### Multiple Variables
sales_data.drop(['InvoiceValue', 'InvoiceID'], axis= 1)

Unnamed: 0,InvoiceDate,CustomerID
0,2022-01-01,WNYQN5037
1,2022-01-01,GJMJI5215
2,2022-01-01,AWXSL7355
3,2022-01-01,TTPXB4921
4,2022-01-01,WQYYR7437
...,...,...
74983,2022-12-31,LZIEQ1426
74984,2022-12-31,HUNOH2192
74985,2022-12-31,CVYTU7738
74986,2022-12-31,QLRFI3440


In [18]:
### Delete usign Del 
del sales_data['CustomerID'] ### use this very cautiously

### *Working with Dates*
* https://www.programiz.com/python-programming/datetime/strftime
* https://strftime.org/

#### *Selecting Rows*

In [42]:
# Create a new feature name Month from InvoiceDate
sales_data['Month'] = sales_data.InvoiceDate.apply(lambda x : x.strftime('%b'))
sales_data.head()

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue,Month
0,2022-01-01,DYSMJ47747,WNYQN5037,85,Jan
1,2022-01-01,JDOBV42881,GJMJI5215,87,Jan
2,2022-01-01,IWATI93376,AWXSL7355,76,Jan
3,2022-01-01,BYIGR33509,TTPXB4921,56,Jan
4,2022-01-01,KEQKA35598,WQYYR7437,84,Jan


In [47]:
# one condition
WNYQN5037_Sales = sales_data[(sales_data['CustomerID'] == 'WNYQN5037')]
WNYQN5037_Sales

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue,Month
0,2022-01-01,DYSMJ47747,WNYQN5037,85,Jan
62,2022-01-01,VSYWX90154,WNYQN5037,89,Jan
73,2022-01-01,CFAKS15904,WNYQN5037,61,Jan
80,2022-01-01,VGDKE48353,WNYQN5037,15,Jan
94,2022-01-01,BSJQP93651,WNYQN5037,40,Jan
...,...,...,...,...,...
74608,2022-12-29,FEJAM89694,WNYQN5037,481,Dec
74676,2022-12-30,GYTES76266,WNYQN5037,480,Dec
74867,2022-12-31,GEMIH97389,WNYQN5037,276,Dec
74896,2022-12-31,DWCQQ82896,WNYQN5037,148,Dec


In [48]:
# Multiple conditions
WNYQN5037_Sales = sales_data[(sales_data['CustomerID'] == 'WNYQN5037') & (sales_data['Month'] == 'Oct')]
WNYQN5037_Sales

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue,Month
56217,2022-10-01,QAXSD43741,WNYQN5037,332,Oct
56353,2022-10-02,OGCZN52054,WNYQN5037,74,Oct
56380,2022-10-02,AFBFA45866,WNYQN5037,233,Oct
56526,2022-10-02,QQCDI87320,WNYQN5037,395,Oct
56679,2022-10-03,CRBRT24019,WNYQN5037,411,Oct
...,...,...,...,...,...
62087,2022-10-29,SOZWY27290,WNYQN5037,225,Oct
62194,2022-10-30,DNPSO67771,WNYQN5037,297,Oct
62224,2022-10-30,YDMNV70933,WNYQN5037,341,Oct
62353,2022-10-31,QFSLA80599,WNYQN5037,147,Oct


In [50]:
# Filtering rows based on date
July_onward_sales = sales_data[sales_data['InvoiceDate'] > '2022-07-01']

Month
Dec    6420
Aug    6406
Oct    6367
Jul    6198
Sep    6102
Nov    6066
Name: count, dtype: int64

In [51]:
# Filtering rows based on numeric value
sales_more_than_500 = sales_data[sales_data['InvoiceValue'] > 500]
sales_more_than_500.head()

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue,Month
62505,2022-11-01,VUKYY91914,FPDOB4807,589,Nov
62509,2022-11-01,ZEZXW79957,CUXTG3496,539,Nov
62524,2022-11-01,IJZRI43271,SHBKJ2911,583,Nov
62527,2022-11-01,HNYGD95443,COIKE5961,576,Nov
62533,2022-11-01,VNBDJ46817,IQHAR1433,591,Nov


In [52]:
# Filtering rows based on a list
Q1 = ['Jan', 'Feb', 'Mar']
Q1_data = sales_data[sales_data['Month'].isin(Q1)]
Q1_data.head()

Unnamed: 0,InvoiceDate,InvoiceID,CustomerID,InvoiceValue,Month
0,2022-01-01,DYSMJ47747,WNYQN5037,85,Jan
1,2022-01-01,JDOBV42881,GJMJI5215,87,Jan
2,2022-01-01,IWATI93376,AWXSL7355,76,Jan
3,2022-01-01,BYIGR33509,TTPXB4921,56,Jan
4,2022-01-01,KEQKA35598,WQYYR7437,84,Jan


In [57]:
# Filtering rows based on a list
Q1 = ['Jan', 'Feb', 'Mar']
Q2_4_data = sales_data[~sales_data['Month'].isin(Q1)]
Q2_4_data.Month.value_counts()

Month
Dec    6420
May    6407
Aug    6406
Jul    6397
Oct    6367
Apr    6174
Sep    6102
Nov    6066
Jun    6046
Name: count, dtype: int64