In [3]:
import numpy as np
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [5]:
print(pd.__doc__)


pandas - a powerful data analysis and manipulation library for Python

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating
    point data.
  - Size mutability: columns can be inserted and deleted from DataFrame and
    higher dimensional objects
  - Automatic and explicit data alignment: objects can be explicitly aligned
    to a set of labels, or the user can simply ignore the labels and

In [7]:
column_name = ['CA1','CA2','CA3']
sit_no = np.arange(1,51)
np.random.seed(12)
cascore = np.random.randint(1,10,150)
cascore = cascore.reshape(50,3)

df=pd.DataFrame(data = cascore,index=sit_no, columns=column_name)
df

Unnamed: 0,CA1,CA2,CA3
1,7,2,3
2,4,4,1
3,7,2,5
4,6,3,7
5,1,6,9
6,3,4,5
7,4,2,8
8,1,3,7
9,3,1,5
10,7,1,1


In [8]:
#retrieving a single column
df['CA2'] 

1     2
2     4
3     2
4     3
5     6
6     4
7     2
8     3
9     1
10    1
11    7
12    5
13    2
14    4
15    6
16    7
17    5
18    8
19    5
20    6
21    7
22    7
23    3
24    6
25    6
26    4
27    9
28    5
29    3
30    9
31    4
32    7
33    4
34    7
35    5
36    2
37    3
38    2
39    6
40    1
41    6
42    2
43    8
44    2
45    2
46    9
47    9
48    2
49    3
50    4
Name: CA2, dtype: int32

In [9]:
#retrieving multiple columns
df[['CA1','CA3']]

Unnamed: 0,CA1,CA3
1,7,3
2,4,1
3,7,5
4,6,7
5,1,9
6,3,5
7,4,8
8,1,7
9,3,5
10,7,1


In [10]:
#retrieving a single row based on set index
#loc - location
df.loc[15]


CA1    6
CA2    6
CA3    1
Name: 15, dtype: int32

In [11]:
#retrieving multiple rows based on set index
df.loc[[14,18,25,26]]

Unnamed: 0,CA1,CA2,CA3
14,6,4,5
18,7,8,5
25,1,6,5
26,1,4,8


In [12]:
# retrieving a single row based on default index
#iloc - for default index
df.iloc[14]

CA1    6
CA2    6
CA3    1
Name: 15, dtype: int32

In [13]:
#retrieving multiple rows based on default index
df.iloc[[13,17,24,25]]

Unnamed: 0,CA1,CA2,CA3
14,6,4,5
18,7,8,5
25,1,6,5
26,1,4,8


In [14]:
#creating a new column using the insert method
age_value = np.random.randint(13,16, 50).reshape(50,1)
df.insert(loc =0, value= age_value, column= 'AGE')
df

Unnamed: 0,AGE,CA1,CA2,CA3
1,15,7,2,3
2,15,4,4,1
3,13,7,2,5
4,14,6,3,7
5,14,1,6,9
6,14,3,4,5
7,14,4,2,8
8,13,1,3,7
9,14,3,1,5
10,14,7,1,1


In [16]:
#creating a column to be the last column
np.random.seed(12)
df['ExamScore'] = np.random.randint(30,71, 50).reshape(50,1)
df

Unnamed: 0,AGE,CA1,CA2,CA3,ExamScore
1,15,7,2,3,41
2,15,4,4,1,57
3,13,7,2,5,36
4,14,6,3,7,32
5,14,1,6,9,33
6,14,3,4,5,33
7,14,4,2,8,42
8,13,1,3,7,52
9,14,3,1,5,35
10,14,7,1,1,43


In [17]:
#create a new column from an existing column
df['TotalScore'] = df['CA1'] + df['CA2'] + df['CA3'] + df['ExamScore']
df

Unnamed: 0,AGE,CA1,CA2,CA3,ExamScore,TotalScore
1,15,7,2,3,41,53
2,15,4,4,1,57,66
3,13,7,2,5,36,50
4,14,6,3,7,32,48
5,14,1,6,9,33,49
6,14,3,4,5,33,45
7,14,4,2,8,42,56
8,13,1,3,7,52,63
9,14,3,1,5,35,44
10,14,7,1,1,43,52


In [18]:
def grade(x):
    if x >= 70 and x <=100:
        return 'A'
    elif x >= 60 and x <=69.99:
        return 'B'
    elif x >= 50 and x <=59.99:
        return 'C'
    elif x >= 40 and x <= 49.99:
        return 'D'
    elif x >= 30 and x <= 39.99:
        return 'E'
    else:
        return 'F'
df['GRADE'] = df['TotalScore'].apply(grade)

In [19]:
df

Unnamed: 0,AGE,CA1,CA2,CA3,ExamScore,TotalScore,GRADE
1,15,7,2,3,41,53,C
2,15,4,4,1,57,66,B
3,13,7,2,5,36,50,C
4,14,6,3,7,32,48,D
5,14,1,6,9,33,49,D
6,14,3,4,5,33,45,D
7,14,4,2,8,42,56,C
8,13,1,3,7,52,63,B
9,14,3,1,5,35,44,D
10,14,7,1,1,43,52,C


In [20]:
#how to drop a column
#axis1 = column, axis0 = row
#inplace = true(drop column permanently)  inplace = false(drop column temporarily)
df.drop('GRADE',axis=1,inplace=True)
df

Unnamed: 0,AGE,CA1,CA2,CA3,ExamScore,TotalScore
1,15,7,2,3,41,53
2,15,4,4,1,57,66
3,13,7,2,5,36,50
4,14,6,3,7,32,48
5,14,1,6,9,33,49
6,14,3,4,5,33,45
7,14,4,2,8,42,56
8,13,1,3,7,52,63
9,14,3,1,5,35,44
10,14,7,1,1,43,52


In [21]:
#how to drop a row
df.drop(50, axis =0, inplace= True)
df

Unnamed: 0,AGE,CA1,CA2,CA3,ExamScore,TotalScore
1,15,7,2,3,41,53
2,15,4,4,1,57,66
3,13,7,2,5,36,50
4,14,6,3,7,32,48
5,14,1,6,9,33,49
6,14,3,4,5,33,45
7,14,4,2,8,42,56
8,13,1,3,7,52,63
9,14,3,1,5,35,44
10,14,7,1,1,43,52
