Link to Medium blog post: https://towardsdatascience.com/how-to-query-your-pandas-dataframe-c6f7d64164bc

# How to Query Your Pandas Dataframe

### Multiple Conditions

![image.png](attachment:image.png)

As data scientists or data analysts, we want to return specific rows of data. One of these scenarios is where you want to apply multiple conditions, all in the same line of code. In order to display my example, I have created some fake sample data of a first and last name, as well as their respective gender and birthdate. This data is displayed above in the screenshot.

The example multiple conditions will essentially answer a specific question, just like when you use SQL. The question is, what percent of our data is Male gender OR a person who was born between 2010 and 2021.

Here is the code that will solve that question (there are a few ways to answer this question, but here is a specific way of doing it):

In [None]:
'''print(“Percent of data who are Males OR were born between 2010 and 2021:”,
 100*round(df[(df[‘Gender’] == ‘M’) | (df[‘Birthdate’] >= ‘2010–01–01’) & 
 (df[‘Birthdate’] <= ‘2021–01–01’)][‘Gender’].count()/df.shape
 [0],4), “%”)'''

To better visualize this code, I have also included this screenshot of that same code from above, along with the output/result. You can also apply these conditions to return the actual rows instead of getting the fraction or percent of rows out of the total rows.

![image.png](attachment:image.png)

Here is the order of commands we performed:

- Return rows with Male Gender
- Include the OR function |
- Return the rows of Birthdate > 2010 and 2021
- Combine those all, and then divide by the total amount of rows

As you can see, this code is similar to something you would see in SQL. I personally think it is easier in pandas because it can be less code, while also being able to visually see all the code in one easy spot, without having to scroll up and down (but this format is just my preference).

### Merging On Multiple, Specific Columns

![image.png](attachment:image.png)

We have probably seen how to merge dataframes together in other tutorials, so I wanted to add a unique approach that I have not really seen out there, which is merging on multiple, specific columns. In this scenario, we want to join two dataframes where two fields are shared between them. You could tell that if there are even more columns, this method could be even more useful.

We have our first dataframe, which is df, then we are merging our columns on a second dataframe, df2. Here is that code to achieve our expected result:

In [1]:
'''merged_df = df.merge(df2, how=’inner’, 
 left_on=cols, 
 right_on=cols
 )'''

'merged_df = df.merge(df2, how=’inner’, \n left_on=cols, \n right_on=cols\n )'

To better visualize this merging and code, I have presented the screenshot below. You see what the second dataframe looks like below, with the First and Last names, just like they are in the first dataframe, but with a new column, Numeric. Then, we have out specific columns that we wanted to merge on, while returning columns Gender, Birthdate, and the new Numeric column as well. The columns are a list of columns, which is named cols.

![image.png](attachment:image.png)

As you can see, this way of merging dataframes is a simple way to achieve the same results that you would get from a SQL query.