# Selecting a subset of columns

Real-life datasets are likely to contain several columns. Some of them might be redundant or inapplicable to our analysis. In such cases, we can select a subset of columns. All we need to do is write the column names in a **list** and use it for filtering.



![image.png](attachment:257674f5-3c8e-47e7-befd-199572be9fe5.png)

The following code snippet selects the <font color = 'red'>product code</font> and <font color = 'red'>price</font> columns in the <font color = 'red'>sales</font>.

In [1]:
import pandas as pd

sales = pd.read_csv("sales.csv")

selected_columns = ["product_code","price"]

print(sales[selected_columns].head())

   product_code    price
0          4187   569.91
1          4195   712.41
2          4204   854.91
3          4219  1034.55
4          4718    26.59


We don’t have to create a list that contains the column names. The same operation can also be handled through this line:

In [3]:
import pandas as pd

sales = pd.read_csv("sales.csv")

# selected_columns = ["product_code","price"]

print(sales[["product_code","price"]].head())

   product_code    price
0          4187   569.91
1          4195   712.41
2          4204   854.91
3          4219  1034.55
4          4718    26.59


It’s important to note that the **<u>desired column names need to be written in a list</u>**. Otherwise, Pandas will think the entire string represents one column, so we’ll end up with an error. Try executing the following code snippet to see the generated error.



In [5]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales["product_code","price"].head())

KeyError: ('product_code', 'price')

Even if we want to select only one column, we need to put it in a list. Otherwise, Pandas will return a <font color='red'>Series</font> instead of a <font color='red'>DataFrame</font> with one column.

In [7]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales["price"].head()).header()

0     569.91
1     712.41
2     854.91
3    1034.55
4      26.59
Name: price, dtype: float64


AttributeError: 'NoneType' object has no attribute 'header'