# Sorting pandas DataFrame

In pandas most powerful method for sorting is .sort_values().<br>
It works on one or more columns.<br>
Note:
* Some key parameters for sorting: <br>
  1. <h1>by :</h1>  Specifies the column(s) to sort by.<br>
  2. <h1>ascending :</h1> Boolean (default True). If False, sort in descending order.<br>
  3. <h1>inplace :</h1> If True, modifies the original dataframe or returns a new sorted dataframe.<br>
  4. <h1>na_position :</h1> Specifies whether to place NaN values at the beginning ('first') or end ('last').<br>
  5. <h1>ignore_index :</h1>  If True, resets the index after sorting.

In [None]:
import pandas as pd
data = {"Name":["Alice","Bob","Franklin","Michel"],
       "Age":[25,30,45,40],
       "Score":[80,90,95,85]}
# create dataframe
df = pd.DataFrame(data)
# sorting "Age" in assending order and by referce the column.
sort_df = df.sort_values(by="Age")
print(sort_df)

In [None]:
# sort "Age" in desending order
sort_df2 = df.sort_values(by="Age",ascending=False) # if i gave ascending=True it will gives default order
print(sort_df2)

In [None]:
# sorting multiple columns 
import pandas as pd
data = {"Name":["Alice","Bob","Franklin","Michel"],
       "Age":[25,30,45,40],
       "Score":[80,90,95,85]}
# create dataframe
df2 = pd.DataFrame(data)

sort_age_score = df2.sort_values(by=["Age","Score"],ascending=True)
print(sort_age_score)
# in this line it sorted by "Age" if "Age" values same, than "score" also be sorted.
# so, to sort score:
sort_score=df2.sort_values(by="Score",ascending=True)
print("Sorted Score :\n",sort_score)

<h1>Sort dataset with missing values :</h1>
Here na_position will be work<br>
Note: sort_values by default placed the missing value in the last.<br>
So, to place this missing value in other place use na_position

In [None]:
import pandas as pd
data_nan = {"Name":["Charie","Bob","Alice","Michel"],
        "Age":[25,22,None,22]}
df_nan = pd.DataFrame(data_nan)
sort1 = df_nan.sort_values(by="Age")
print(sort1)
# sorted by age place missing value in first
sort_nan = df_nan.sort_values(by="Age",na_position="first")
print("after using na_position:",sort_nan)

# Sorting algorithm using .kind="parameter"

<b>1. quicksort :</b> Quicksort is a highly efficient, divide-and-conquer sorting algorithm.<br> 
It selects a "pivot" element and partitions the dataset into two halves:<br>
one with elements smaller than the pivot and the other with elements greater than the pivot.<br>
<b>2. meresort :</b>  Divides the dataset into smaller subarrays, sorts them,<br>
and then merges them back together in sorted order.<br>
<b>3. heapsort :</b>  Heapsort is another comparison-based sorting algorithm that<br>
builds a heap data structure to systematically extract the largest or smallest element and reorder the dataset.<br>


In [None]:
import pandas as pd
data3 ={"Name":["Alice", "Bob", "Charlie", "David", "Eve"],
       "Age":[28,22,25,22,28],
    "Score":[85,90,95,80,88]}
df3 = pd.DataFrame(data3)
# sorting "Age" by using mergesort
sort_df = df3.sort_values(by="Age",kind="mergesort")
print(sort_df)

<b>Custom sort by key function</b><br>
by using key parameter

In [None]:
# sorting by name with alphabetical order
sort_df2 = df3.sort_values(by="Name",key=lambda col: col.str.lower())
print(sort_df2)

<b>Key Takeaways:</b><br>

<b>1.</b>sort_values() is versatile and allows sorting by one or multiple columns.<br>
<b>2.</b>You can control whether sorting is ascending or descending using the ascending parameter.<br>
<b>3.</b>Missing values (NaN) can be placed at either the beginning or end using the na_position parameter.<br>
<b>4.</b>Custom sorting logic can be applied using the<b>key</b> parameter.

# Learning Pivotal Table by using pandas

In [None]:
# Creat a simple dataframe 
#Columns will be product,category,quantity,amount
import pandas as pd
df = pd.DataFrame({"Product":["Carrots","Broccoli","Banana","Banana","Beans","Orange","Broccoli","Banana"],
                  "Category":["Vagetable","Vagetable","Fruit","Fruit","Vagetable","Fruit","Vagetable","Fruit"],
                  "Quantity":[8,5,3,4,5,9,11,8],
                  "Amount":[270,239,617,384,626,610,62,90]})
df

<b>Using some example to understand the work of pivotal table</b><br>
<b>Example 1:</b> Get the total sales of each product

In [None]:
pivot_prod = df.pivot_table(index=["Product"],values=["Amount"],aggfunc=["sum"])
print(pivot_prod)

<b>Example 2:</b> Get the total sales of each category

In [None]:
pivot_cate = df.pivot_table(index=["Category"],values=["Amount"],aggfunc=["sum"])
print(pivot_cate)

<b>Example 3:</b> Get the total sales by category and product both

In [None]:
pivot_both = df.pivot_table(index=["Product","Category"],values=["Amount"],aggfunc=["sum"])
print(pivot_both)

<b>Example 4:</b> Get the mean,median,minimum sales by category

In [None]:
pivot_cat_3m = df.pivot_table(index=["Category"],values=["Amount"],aggfunc=["mean","median","min"])
print(pivot_cat_3m)

<b>Example 5:</b> Get the mean,median,minimum sales by product

In [None]:
pivot_pro_3m = df.pivot_table(index=["Product"],values=["Amount"],aggfunc=["median","mean","min"])
print(pivot_pro_3m)