- Getting started on Pandas
- An overview on Pandas `Series` and `DataFrame`
- Loading DataFrame from various data sources
- Creating Pandas DataFrame from custom Log parsers
- Reading large datasets using chunk-reading approach
- Pandas Series and DataFrame methods and operators
- Indexing and Selecting data
- Working with missing data
- Using various data munging techniques
- Managing timestamps
- Using visualization helpers


### Pandas overview

Pandas a fast, flexible framework that allows data analysis and data manipulation using Python. It allows manipulation of tabular data (spread-sheets) and perform analysis and aid in visualizing the same.

#### Getting started on Pandas

To install pandas:
  - pip install pandas
  
If you're using Anaconda Python distribution, pandas is pre-bundled in the same

In [8]:
import pandas as pd

In [9]:
pd.__version__

'2.1.4'

In [12]:
series = pd.Series(10, index=["temperature"])
series

temperature    10
dtype: int64

In [14]:
series["temperature"]

10

In [15]:
series.iloc[0]

10

In [16]:
series.iat[0]

10

In [17]:
series = pd.Series(10, index=range(5))
series

0    10
1    10
2    10
3    10
4    10
dtype: int64

In [19]:
series = pd.Series(["Guido", "Barry", "Alex", "Raymond", "Nick"], name="Pythonistas")
series

0      Guido
1      Barry
2       Alex
3    Raymond
4       Nick
Name: Pythonistas, dtype: object

In [21]:
series.name

'Pythonistas'

In [26]:
series = pd.Series(["Guido", "Barry", "Alex", "Raymond", "Nick"], 
                   index=["Founder", "Developer", "Co-Founder", "Contributor", "Developer"],
                   name="Pythonistas")
series

Founder          Guido
Developer        Barry
Co-Founder        Alex
Contributor    Raymond
Developer         Nick
Name: Pythonistas, dtype: object

In [27]:
series["Developer"]

Developer    Barry
Developer     Nick
Name: Pythonistas, dtype: object

In [34]:
series["Founder": "Contributor"]

Founder          Guido
Developer        Barry
Co-Founder        Alex
Contributor    Raymond
Name: Pythonistas, dtype: object

In [35]:
series.iloc[2:4]

Co-Founder        Alex
Contributor    Raymond
Name: Pythonistas, dtype: object

In [36]:
series

Founder          Guido
Developer        Barry
Co-Founder        Alex
Contributor    Raymond
Developer         Nick
Name: Pythonistas, dtype: object

In [37]:
series[["Founder", "Contributor"]]

Founder          Guido
Contributor    Raymond
Name: Pythonistas, dtype: object

In [41]:
s = pd.Series([10, 4.5, 45], index=[1, "hello", 5.6])
s

1        10.0
hello     4.5
5.6      45.0
dtype: float64

In [45]:
s.index.to_list()

[1, 'hello', 5.6]

In [47]:
s.to_list()

[10.0, 4.5, 45.0]

In [52]:
series = pd.Series({
              "username": "Raymond",
              "role": "Core Developer",
              "place": "Seattle"})
series

username           Raymond
role        Core Developer
place              Seattle
dtype: object

In [53]:
df = pd.DataFrame([10, 20, 30, 40, 50])
df

Unnamed: 0,0
0,10
1,20
2,30
3,40
4,50


In [54]:
import numpy as np

In [55]:
df = pd.DataFrame(np.arange(100).reshape((10, 10)))
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,1,2,3,4,5,6,7,8,9
1,10,11,12,13,14,15,16,17,18,19
2,20,21,22,23,24,25,26,27,28,29
3,30,31,32,33,34,35,36,37,38,39
4,40,41,42,43,44,45,46,47,48,49
5,50,51,52,53,54,55,56,57,58,59
6,60,61,62,63,64,65,66,67,68,69
7,70,71,72,73,74,75,76,77,78,79
8,80,81,82,83,84,85,86,87,88,89
9,90,91,92,93,94,95,96,97,98,99


In [60]:
df = pd.DataFrame(
        {"name": ["Alex", "Barry", "Raymond", "Guido", "Nick"],
         "role": ["Developer", "Consultant", "Developer", "Founder", "Co-Founder"],
         "place": ["seattle", "new york", "denver", "san francisco", "new jersey"]
        })
df

Unnamed: 0,name,role,place
0,Alex,Developer,seattle
1,Barry,Consultant,new york
2,Raymond,Developer,denver
3,Guido,Founder,san francisco
4,Nick,Co-Founder,new jersey


In [62]:
df = pd.DataFrame(
       [{"name": "Alex", "role": "Developer", "place": "seattle"},
        {"name": "Barry", "role": "Consultant", "place": "new york"},
        {"name": "Guido", "role": "Founder", "place": "san francisco"}
       ])
df

Unnamed: 0,name,role,place
0,Alex,Developer,seattle
1,Barry,Consultant,new york
2,Guido,Founder,san francisco


In [65]:
data = [[34, 78],
        [26, 60],
        [28, 56]]

df = pd.DataFrame(data)
df

Unnamed: 0,0,1
0,34,78
1,26,60
2,28,56


In [67]:
data = [[34, 78],
        [26, 60],
        [28, 56]]

df = pd.DataFrame(data, index=["Chennai", "Bengaluru", "Pune"], columns=["temperature", "humidity"])
df

Unnamed: 0,temperature,humidity
Chennai,34,78
Bengaluru,26,60
Pune,28,56


In [68]:
df.temperature

Chennai      34
Bengaluru    26
Pune         28
Name: temperature, dtype: int64

In [69]:
df["temperature"]

Chennai      34
Bengaluru    26
Pune         28
Name: temperature, dtype: int64

In [71]:
df[["temperature"]]

Unnamed: 0,temperature
Chennai,34
Bengaluru,26
Pune,28


In [70]:
df[["humidity", "temperature"]]

Unnamed: 0,humidity,temperature
Chennai,78,34
Bengaluru,60,26
Pune,56,28


In [72]:
df.loc["Chennai"]

temperature    34
humidity       78
Name: Chennai, dtype: int64

In [74]:
df.iloc[0]

temperature    34
humidity       78
Name: Chennai, dtype: int64

In [75]:
df.loc[["Chennai"]]

Unnamed: 0,temperature,humidity
Chennai,34,78


In [76]:
df.loc[["Chennai", "Pune"]]

Unnamed: 0,temperature,humidity
Chennai,34,78
Pune,28,56


In [79]:
df = pd.read_csv("weather_data.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2017,32,6,Rain
1,1/2/2017,35,7,Sunny
2,1/3/2017,28,2,Snow
3,1/4/2017,24,7,Snow
4,1/5/2017,32,4,Rain
5,1/6/2017,31,2,Sunny


In [82]:
df.columns.to_list()

['day', 'temperature', 'windspeed', 'event']

In [85]:
df = pd.read_csv("weather_no_header.csv", names=["date", "place", "temperature", "humidity"])
df

Unnamed: 0,date,place,temperature,humidity
0,5/1/2017,new york,65,56
1,5/2/2017,new york,66,58
2,5/3/2017,new york,68,60
3,5/1/2017,mumbai,75,80
4,5/2/2017,mumbai,78,83
5,5/3/2017,mumbai,82,85
6,5/1/2017,beijing,80,26
7,5/2/2017,beijing,77,30
8,5/3/2017,beijing,79,35


In [88]:
df = pd.read_csv("weather_mangled_headers.csv", quotechar="^" )
df

Unnamed: 0,date,city,temperature,humidity
0,5/1/2017,"new,york",65,56
1,5/2/2017,new york,66,58
2,5/3/2017,new york,68,60
3,5/1/2017,mumbai,75,80
4,5/2/2017,mumbai,78,83
5,5/3/2017,mumbai,82,85
6,5/1/2017,beijing,80,26
7,5/2/2017,beijing,77,30
8,5/3/2017,beijing,79,35


In [90]:
line = '43.134.122.128 - - [01/May/2024:00:00:22 -0400] "GET /articles/linux-system-programming/introduction-to-linux-ipc-mechanims.html HTTP/1.1" 302 594 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36" 0 0 "off:-:-" 304 1524 192.252.149.39 www.chandrashekar.info - - 43.134.122.128'

print(line)
field_names = ("client_ipaddr", "timestamp", "request_method", "request_url", "response_code", "bytes_transferred")


43.134.122.128 - - [01/May/2024:00:00:22 -0400] "GET /articles/linux-system-programming/introduction-to-linux-ipc-mechanims.html HTTP/1.1" 302 594 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36" 0 0 "off:-:-" 304 1524 192.252.149.39 www.chandrashekar.info - - 43.134.122.128


In [110]:
regex = r"""
    (?P<client_ipaddr>
      [\d\.]+               # Match IPv4 address format which are either digits or '.'
    )
    \s-\s-\s\[  # Skip the noise
    (?P<timestamp>
       .+                  # Extract timestamp found within [ ... ] brackets
    )
    \]
    \s\"
    (?P<request_method>
      GET|HEAD|POST|PUT|DELETE
    )
    \s
    (?P<request_url>
      \S+                  # Extract request url - string with no spaces in them
    )
    \s.+\"
    \s
    (?P<response_code>
      \d+
    )
    \s
    (?P<bytes_transferred>
      \d+
    )
"""

In [111]:
import re
pattern = re.compile(regex, re.VERBOSE | re.DOTALL)
match = pattern.match(line)
match.groupdict()

{'client_ipaddr': '43.134.122.128',
 'timestamp': '01/May/2024:00:00:22 -0400',
 'request_method': 'GET',
 'request_url': '/articles/linux-system-programming/introduction-to-linux-ipc-mechanims.html',
 'response_code': '304',
 'bytes_transferred': '1524'}

In [130]:
# def parse_log(filename, regex, transform={}): 
def parse_log(filename, regex): 
    import re
    pattern = re.compile(regex, re.VERBOSE | re.DOTALL)
    with open(filename) as logfile:
        for line in logfile:
            match = pattern.match(line)
            yield match.groupdict()
            
        

In [116]:
# for rec in parse_log("www.chandrashekar.info.log", regex, transform={"timestamp": lambda x: strptime(x, format="%a/%b/...", 
#                                                                      "bytes_transferred": int}
for rec in parse_log("www.chandrashekar.info.log", regex): 
    print(rec)

{'client_ipaddr': '43.134.122.128', 'timestamp': '01/May/2024:00:00:22 -0400', 'request_method': 'GET', 'request_url': '/articles/linux-system-programming/introduction-to-linux-ipc-mechanims.html', 'response_code': '304', 'bytes_transferred': '1524'}
{'client_ipaddr': '43.134.122.128', 'timestamp': '01/May/2024:00:00:22 -0400', 'request_method': 'GET', 'request_url': '/articles/linux-system-programming/introduction-to-linux-ipc-mechanims', 'response_code': '300', 'bytes_transferred': '230003'}
{'client_ipaddr': '40.77.167.93', 'timestamp': '01/May/2024:00:13:38 -0400', 'request_method': 'GET', 'request_url': '/robots.txt', 'response_code': '211', 'bytes_transferred': '211848'}
{'client_ipaddr': '43.153.110.177', 'timestamp': '01/May/2024:00:13:41 -0400', 'request_method': 'GET', 'request_url': '/', 'response_code': '534', 'bytes_transferred': '558'}
{'client_ipaddr': '43.153.110.177', 'timestamp': '01/May/2024:00:13:43 -0400', 'request_method': 'GET', 'request_url': '/', 'response_code

In [117]:
df = pd.DataFrame(parse_log("www.chandrashekar.info.log", regex))
df

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
0,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,304,1524
1,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,300,230003
2,40.77.167.93,01/May/2024:00:13:38 -0400,GET,/robots.txt,211,211848
3,43.153.110.177,01/May/2024:00:13:41 -0400,GET,/,534,558
4,43.153.110.177,01/May/2024:00:13:43 -0400,GET,/,573,225575
5,52.167.144.205,01/May/2024:00:13:48 -0400,GET,/storage/app/media/profile/chandrashekar-profi...,347,1917
6,146.196.34.100,01/May/2024:00:20:39 -0400,GET,/sites/default/files/color/pixture_reloaded-14...,295,248902
7,52.230.152.203,01/May/2024:00:21:35 -0400,GET,/robots.txt,247,227885
8,52.230.152.203,01/May/2024:00:21:35 -0400,GET,/robots.txt,1227,239337
9,146.196.34.100,01/May/2024:00:38:44 -0400,GET,/sites/default/files/color/pixture_reloaded-14...,295,244347


In [120]:
import numpy as np

In [122]:
df.bytes_transferred = df.bytes_transferred.astype(np.int64)

In [124]:
df.bytes_transferred.sum()

1631896

In [128]:
pd.to_datetime("01/May/2024:00:00:22 -0400", format="%d/%b/%Y:%H:%M:%S %z")

Timestamp('2024-05-01 00:00:22-0400', tz='UTC-04:00')

In [129]:
df.timestamp = pd.to_datetime(df.timestamp, format="%d/%b/%Y:%H:%M:%S %z")
df.timestamp.dtype

datetime64[ns, UTC-04:00]

In [131]:
df = pd.DataFrame(parse_log("www.chandrashekar.info.log", regex))

In [132]:
df.shape

(50082, 6)

In [134]:
df.head(2)

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
0,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,304,1524
1,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,300,230003


In [143]:
pd.set_option("display.max_columns", 30) # Allow displaying of upto 200 columns in a row.

In [144]:
df.head(2)

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
0,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,304,1524
1,43.134.122.128,01/May/2024:00:00:22 -0400,GET,/articles/linux-system-programming/introductio...,300,230003


In [145]:
df.describe()

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
count,50082,50082,50082,50082,50082,50082
unique,2903,20252,3,5750,1146,19931
top,194.233.89.203,03/May/2024:21:24:34 -0400,GET,/,181,495
freq,4094,55,49656,3111,1614,168


In [146]:
df[["client_ipaddr", "bytes_transferred"]]

Unnamed: 0,client_ipaddr,bytes_transferred
0,43.134.122.128,1524
1,43.134.122.128,230003
2,40.77.167.93,211848
3,43.153.110.177,558
4,43.153.110.177,225575
...,...,...
50077,172.203.63.110,664
50078,172.203.63.110,201651
50079,198.235.24.48,98972
50080,198.235.24.48,40519


In [150]:
df["client_ipaddr"].value_counts().head(5)

client_ipaddr
194.233.89.203    4094
159.223.95.58     3244
152.42.228.28     3244
167.172.86.134    1800
84.247.116.138    1673
Name: count, dtype: int64

In [152]:
df["bytes_transferred"].max()

'999306'

In [155]:
df[df.bytes_transferred == df.bytes_transferred.max()].client_ipaddr

13602    89.43.208.21
Name: client_ipaddr, dtype: object

In [157]:
df.describe()

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
count,50082,50082,50082,50082,50082,50082
unique,2903,20252,3,5750,1146,19931
top,194.233.89.203,03/May/2024:21:24:34 -0400,GET,/,181,495
freq,4094,55,49656,3111,1614,168


In [158]:
df.shape

(50082, 6)

In [161]:
df.bytes_transferred = df.bytes_transferred.astype(np.int32)
df.timestamp = pd.to_datetime(df.timestamp, format="%d/%b/%Y:%H:%M:%S %z")
df.describe()

Unnamed: 0,bytes_transferred
count,50082.0
mean,97069.69
std,1374216.0
min,210.0
25%,532.0
50%,666.0
75%,201251.8
max,305548600.0


In [162]:
df.describe(include="all")

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
count,50082,50082,50082,50082,50082.0,50082.0
unique,2903,,3,5750,1146.0,
top,194.233.89.203,,GET,/,181.0,
freq,4094,,49656,3111,1614.0,
mean,,2024-05-12 19:51:29.481270528-04:00,,,,97069.69
min,,2024-05-01 00:00:22-04:00,,,,210.0
25%,,2024-05-08 06:05:20.750000128-04:00,,,,532.0
50%,,2024-05-14 07:39:35-04:00,,,,666.0
75%,,2024-05-17 18:41:44-04:00,,,,201251.8
max,,2024-05-20 23:47:30-04:00,,,,305548600.0


In [166]:
df.isna().sum() # An easy to check of missing values and get their count in a dataframe

client_ipaddr        0
timestamp            0
request_method       0
request_url          0
response_code        0
bytes_transferred    0
dtype: int64

In [172]:
df[df.duplicated()]

Unnamed: 0,client_ipaddr,timestamp,request_method,request_url,response_code,bytes_transferred
33638,182.204.143.236,2024-05-16 04:42:04-04:00,GET,/themes/chandrashekar_babu_html5up_halcyonic/a...,216,1530


In [174]:
df = df.drop_duplicates()

In [176]:
df.shape

(50081, 6)

In [177]:
df = pd.read_csv("weather_data_missing_fields.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,,Cloudy
5,1/6/2020,,7.0,Rain
6,1/7/2020,22.0,,Rain
7,1/8/2020,,,Sunny
8,1/9/2020,,,
9,1/10/2020,20.0,8.0,Cloudy


In [179]:
df.isna().sum()

day            0
temperature    4
windspeed      4
event          1
dtype: int64

In [183]:
df.dropna()

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
9,1/10/2020,20.0,8.0,Cloudy
10,1/11/2020,24.0,12.0,Sunny


In [186]:
df.dropna().reset_index(drop=True)

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/10/2020,20.0,8.0,Cloudy
5,1/11/2020,24.0,12.0,Sunny


In [187]:
df.dropna(inplace=True)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
9,1/10/2020,20.0,8.0,Cloudy
10,1/11/2020,24.0,12.0,Sunny


In [188]:
df.reset_index(inplace=True, drop=True)
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/10/2020,20.0,8.0,Cloudy
5,1/11/2020,24.0,12.0,Sunny


In [195]:
df = pd.read_csv("weather_data_missing_fields.csv")
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,,Cloudy
5,1/6/2020,,7.0,Rain
6,1/7/2020,22.0,,Rain
7,1/8/2020,,,Sunny
8,1/9/2020,,,
9,1/10/2020,20.0,8.0,Cloudy


In [193]:
df.temperature = df.temperature.interpolate()

In [201]:
df.temperature.interpolate()

0     20.000000
1     23.000000
2     21.000000
3     18.000000
4     19.333333
5     20.666667
6     22.000000
7     21.333333
8     20.666667
9     20.000000
10    24.000000
Name: temperature, dtype: float64

In [196]:
df

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,,Cloudy
5,1/6/2020,,7.0,Rain
6,1/7/2020,22.0,,Rain
7,1/8/2020,,,Sunny
8,1/9/2020,,,
9,1/10/2020,20.0,8.0,Cloudy


In [199]:
df.temperature.ffill()

0     20.0
1     23.0
2     21.0
3     18.0
4     18.0
5     18.0
6     22.0
7     22.0
8     22.0
9     20.0
10    24.0
Name: temperature, dtype: float64

In [203]:
df.fillna({
     "temperature": df.temperature.mean(),
     "event": df["event"].value_counts().index[0],
     "windspeed": 7.0})

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,21.142857,7.0,Cloudy
5,1/6/2020,21.142857,7.0,Rain
6,1/7/2020,22.0,7.0,Rain
7,1/8/2020,21.142857,7.0,Sunny
8,1/9/2020,21.142857,7.0,Rain
9,1/10/2020,20.0,8.0,Cloudy


In [205]:
fn = lambda x: x[-4:]

df.fillna({
     "temperature": fn(df.day),
     "event": df["event"].value_counts().index[0],
     "windspeed": 7.0})

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
1,1/2/2020,23.0,6.0,Sunny
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,7.0,Cloudy
5,1/6/2020,,7.0,Rain
6,1/7/2020,22.0,7.0,Rain
7,1/8/2020,1/8/2020,7.0,Sunny
8,1/9/2020,1/9/2020,7.0,Rain
9,1/10/2020,20.0,8.0,Cloudy


In [209]:
df.transform(lambda x: print(x))

0      1/1/2020
1      1/2/2020
2      1/3/2020
3      1/4/2020
4      1/5/2020
5      1/6/2020
6      1/7/2020
7      1/8/2020
8      1/9/2020
9     1/10/2020
10    1/11/2020
Name: day, dtype: object
0     20.0
1     23.0
2     21.0
3     18.0
4      NaN
5      NaN
6     22.0
7      NaN
8      NaN
9     20.0
10    24.0
Name: temperature, dtype: float64
0      6.0
1      6.0
2      6.0
3      9.0
4      NaN
5      7.0
6      NaN
7      NaN
8      NaN
9      8.0
10    12.0
Name: windspeed, dtype: float64
0       Rain
1      Sunny
2       Rain
3     Cloudy
4     Cloudy
5       Rain
6       Rain
7      Sunny
8        NaN
9     Cloudy
10     Sunny
Name: event, dtype: object


ValueError: Function did not transform

In [218]:
df[(df.temperature >= 18) & (df.temperature <= 21)]

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
9,1/10/2020,20.0,8.0,Cloudy


In [221]:
df.query("18 <= temperature <= 21")

Unnamed: 0,day,temperature,windspeed,event
0,1/1/2020,20.0,6.0,Rain
2,1/3/2020,21.0,6.0,Rain
3,1/4/2020,18.0,9.0,Cloudy
9,1/10/2020,20.0,8.0,Cloudy


In [219]:
df[(df.event == "Sunny") | (df.event == "Cloudy")]

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2020,23.0,6.0,Sunny
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,,Cloudy
7,1/8/2020,,,Sunny
9,1/10/2020,20.0,8.0,Cloudy
10,1/11/2020,24.0,12.0,Sunny


In [220]:
df.query("event == 'Sunny' or event == 'Cloudy'")

Unnamed: 0,day,temperature,windspeed,event
1,1/2/2020,23.0,6.0,Sunny
3,1/4/2020,18.0,9.0,Cloudy
4,1/5/2020,,,Cloudy
7,1/8/2020,,,Sunny
9,1/10/2020,20.0,8.0,Cloudy
10,1/11/2020,24.0,12.0,Sunny


In [228]:
df = pd.DataFrame({
    "day": ["1/1/2020", "1/2/2020", "1/3/2020"],
    "temperature": [28, 25, 32],
    "event": ["Cloudy", "Sunny", "Sunny"]
}, index=["Chennai", "Pune", "Noida"])
df

Unnamed: 0,day,temperature,event
Chennai,1/1/2020,28,Cloudy
Pune,1/2/2020,25,Sunny
Noida,1/3/2020,32,Sunny


In [233]:
df.

day            1/1/2020
temperature          28
event            Cloudy
Name: Chennai, dtype: object