## pandas
Pandas is a powerful open-source data manipulation and analysis library for Python. It provides easy-to-use data structures and data analysis tools, making it a popular choice for data scientists and analysts.

With Pandas, you can easily load, manipulate, and analyze structured data. It offers a variety of data structures, such as Series (1-dimensional labeled array) and DataFrame (2-dimensional labeled data structure), which allow you to efficiently handle and analyze data.

Some key features of Pandas include:
- Data alignment and handling missing data
- Data filtering, selection, and transformation
- Grouping and aggregation of data
- Merging, joining, and reshaping datasets
- Time series analysis and handling datetime data
- Data visualization and plotting

Pandas integrates well with other libraries in the Python ecosystem, such as NumPy, Matplotlib, and scikit-learn, making it a versatile tool for data analysis and manipulation.

To get started with Pandas, you can import it using the following code:
```python
import pandas as pd
```




In [1]:
import pandas as pd
import numpy as np


In [2]:
lst=[[1,2,3],[4,5,6],[7,8,9]]
series=pd.Series(lst)
print(series)

0    [1, 2, 3]
1    [4, 5, 6]
2    [7, 8, 9]
dtype: object


In [3]:
series2=pd.Series(lst,index=['a','b','c'])#index is used to give the index of the series instead of 0,1,2 we can give a,b,c
print(series2)

a    [1, 2, 3]
b    [4, 5, 6]
c    [7, 8, 9]
dtype: object


In [4]:
#dictionary
cars={'BMW':100,'Audi':200,'Benz':300}
series3=pd.Series(cars)
print(series3)
df=pd.DataFrame(series3)
print(df)


BMW     100
Audi    200
Benz    300
dtype: int64
        0
BMW   100
Audi  200
Benz  300


# dataframe

In [5]:
dict1={"name":['karan','arjun','ram'],
       'age':[20,21,22],
       'city':['delhi','mumbai','pune']}
df=pd.DataFrame(dict1)
print(dict1)
print(df)

{'name': ['karan', 'arjun', 'ram'], 'age': [20, 21, 22], 'city': ['delhi', 'mumbai', 'pune']}
    name  age    city
0  karan   20   delhi
1  arjun   21  mumbai
2    ram   22    pune


In [6]:
import numpy.random as npr
my_data=npr.rand(3,4)#3 rows and 4 columns of random numbers
my_row=['A','B','C']
my_col=['W','X','Y','Z']


In [7]:
#create dataframe
df=pd.DataFrame(data=my_data,index=my_row,columns=my_col)
print(df)

          W         X         Y         Z
A  0.817168  0.269376  0.760426  0.557767
B  0.027462  0.846940  0.424000  0.287968
C  0.674891  0.417205  0.137942  0.728080


In [8]:
#import csv File
my_df2=pd.read_csv(r'C:\Users\karan\OneDrive - Indian Institute of Science\Desktop\summer term\pandas\datacsv.csv')
print(my_df2)

   Temperature(K)  T1(s)  T2(s)    Tavg      ∆T  η(mPas)
0             301  0.120  0.120  0.1200  0.0000   0.0026
1             303  0.136  0.148  0.1420  0.0060   0.0031
2             308  0.150  0.152  0.1510  0.0010   0.0033
3             313  0.160  0.165  0.1625  0.0025   0.0035
4             323  0.216  0.224  0.2200  0.0040   0.0048
5             333  0.235  0.236  0.2355  0.0005   0.0051
6             343  0.254  0.260  0.2570  0.0030   0.0056


In [9]:
#pulling out the data
print(my_df2['Temperature(K)'])
print(my_df2[['Temperature(K)','T1(s)']])

0    301
1    303
2    308
3    313
4    323
5    333
6    343
Name: Temperature(K), dtype: int64
   Temperature(K)  T1(s)
0             301  0.120
1             303  0.136
2             308  0.150
3             313  0.160
4             323  0.216
5             333  0.235
6             343  0.254


In [10]:
#pulling rows
print(my_df2.loc[1])#loc is used to pull out the row and 1 is the index of the row
print(my_df2.loc[1:3])#pulls out the rows from 1 to 3

Temperature(K)    303.0000
T1(s)               0.1360
T2(s)               0.1480
Tavg                0.1420
∆T                  0.0060
η(mPas)             0.0031
Name: 1, dtype: float64
   Temperature(K)  T1(s)  T2(s)    Tavg      ∆T  η(mPas)
1             303  0.136  0.148  0.1420  0.0060   0.0031
2             308  0.150  0.152  0.1510  0.0010   0.0033
3             313  0.160  0.165  0.1625  0.0025   0.0035


In [11]:
#info
print(my_df2.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Temperature(K)  7 non-null      int64  
 1   T1(s)           7 non-null      float64
 2   T2(s)           7 non-null      float64
 3   Tavg            7 non-null      float64
 4   ∆T              7 non-null      float64
 5   η(mPas)         7 non-null      float64
dtypes: float64(5), int64(1)
memory usage: 468.0 bytes
None


In [12]:
#shape
print(my_df2.shape)#gives the shape of the dataframe

(7, 6)


In [13]:
#dimension
print(my_df2.ndim)#gives the dimension of the dataframe

2


In [14]:
#stats
print(my_df2.describe())#gives the statistics of the dataframe


       Temperature(K)     T1(s)     T2(s)      Tavg        ∆T   η(mPas)
count        7.000000  7.000000  7.000000  7.000000  7.000000  7.000000
mean       317.714286  0.181571  0.186429  0.184000  0.002429  0.004000
std         15.882005  0.052624  0.052940  0.052739  0.002130  0.001149
min        301.000000  0.120000  0.120000  0.120000  0.000000  0.002600
25%        305.500000  0.143000  0.150000  0.146500  0.000750  0.003200
50%        313.000000  0.160000  0.165000  0.162500  0.002500  0.003500
75%        328.000000  0.225500  0.230000  0.227750  0.003500  0.004950
max        343.000000  0.254000  0.260000  0.257000  0.006000  0.005600


In [15]:
# describe specific column
print(my_df2['Temperature(K)'].describe())

count      7.000000
mean     317.714286
std       15.882005
min      301.000000
25%      305.500000
50%      313.000000
75%      328.000000
max      343.000000
Name: Temperature(K), dtype: float64


In [16]:
# describe specific column
dict1={"name":['karan','arjun','arjun','ram'],
       'age':[20,21,19,22],
       'city':['delhi','kolkata','mumbai','pune']}
df=pd.DataFrame(dict1)
print(df)
print(df['name'].describe())

    name  age     city
0  karan   20    delhi
1  arjun   21  kolkata
2  arjun   19   mumbai
3    ram   22     pune
count         4
unique        3
top       arjun
freq          2
Name: name, dtype: object


In [17]:
#selecting rows
print(df[df['age']>20])


    name  age     city
1  arjun   21  kolkata
3    ram   22     pune


In [18]:
#selecting specific columns
print(df[['name','city']])

    name     city
0  karan    delhi
1  arjun  kolkata
2  arjun   mumbai
3    ram     pune


In [19]:
print(df.iloc[0,1])#iloc is used to pull out the data from the dataframe using the index of the row and column

20


In [20]:
print(df.iloc[0:2,0:2])#pulls out the data from 0 to 2 rows and 0 to 2 columns

    name  age
0  karan   20
1  arjun   21


In [21]:
#by dot notation
print(df.age)

0    20
1    21
2    19
3    22
Name: age, dtype: int64


# count

In [25]:
 dog_df=pd.read_csv('dog_data.csv')
print(dog_df)   

                    Breed              Color        DogName  OwnerZip
0                COCKAPOO              BROWN        CHARLEY     15236
1            GER SHEPHERD        BLACK/BROWN         TACODA     15238
2           BELG MALINOIS            BRINDLE           EICH     15238
3                   MIXED        BLACK/BROWN          ARROW     15104
4     AM PIT BULL TERRIER        WHITE/BROWN         OAKLEY     15139
...                   ...                ...            ...       ...
2665         GOLDENDOODLE              BROWN        WINSLOW     15044
2666    YORKSHIRE TERRIER        BLACK/BROWN  ROCKY KALAKOS     15220
2667              LAB MIX  WHITE/BLACK/BROWN          ELLIE     15220
2668         GOLDENDOODLE              WHITE       CLARENCE     15143
2669    SHETLAND SHEEPDOG              BLACK        GRIFFIN     15136

[2670 rows x 4 columns]
