### Data Frame
    A data frame is a two-dimensional array-like structure or a table in which a column contains values of one variable, and rows contains one set of values from each column. A data frame is a special case of the list in which each component has equal length.
    
    In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a data frame can contain different data types such as numeric, character, factor, etc.
    
#### There are following characteristics of a data frame.
            1. The columns name should be non-empty.
            2. The rows name should be unique.
            3. The data which is stored in a data frame can be a factor, numeric, or character type.
            4. Each column contains the same number of data items.
![image.png](attachment:image.png)

## How to create Data Frame

In R, the data frames are created with the help of frame() function of data. This function contains the vectors of any type such as numeric, character, or integer. In below example, we create a data frame that contains employee id (integer vector), employee name(character vector), salary(numeric vector), and starting date(Date vector).

Example

# Creating the data frame.  
            emp.data<- data.frame(  
                                employee_id = c (1:5),   
                                employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
            sal = c(623.3,915.2,611.0,729.0,843.25),   
  
            starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
                                      "2015-03-27")),  
            stringsAsFactors = FALSE  
            )  
            
# Printing the data frame.            
        print(emp.data)  
        
        Output

    employee_id            employee_name       sal                  starting_date
    1           1                   Shubham               623.30                 2012-01-01
    2           2                    Arpita                    915.20                 2013-09-23
    3           3                   Nishka                   611.00                  2014-11-15
    4           4                   Gunjan                  729.00                  2014-05-11
    5          5                    Sumit                    843.25                  2015-03-27

In [1]:
#data frame in R-programming
#Characteristic of data frame in R-programming 
#1. the column names should be non-empty and unique
#2. The row names should be unique.
#3. The data stored in data frame can be numeric ,character 
#4. Each column should contain same number of data items
#5. the column name should bu unique means name of two columns
#should not be same .

In [2]:
# Create a DataFrame with the help of inbuilt command data.frame()
# Syntax  :    data.frame(vector_1,vector_2,..........)
# Suppose we have to create the dataframe for employee

# First create a vector for emp_id,emp_name,sal and date of joining
emp_id = 101:110
emp_name = c("Nitesh","Rahul","Raj","Vishal","Seema","Deepa","Pankaj","Kamal","Leela","Ram")
sal = c(5000,4500,5600,1800,4200,800,2300,2000,1000,1500)
DOJ = as.Date(c("2014-03-23","2012-09-16","2014-03-23","2018-01-19","2021-05-04",
             "2020-07-26","2019-04-04","2018-05-28","2020-12-08","2021-02-15"))

In [3]:
# Create a DataFrame with the help of vector
emp<- data.frame(emp_id,emp_name,sal,DOJ)
print(emp)

   emp_id emp_name  sal        DOJ
1     101   Nitesh 5000 2014-03-23
2     102    Rahul 4500 2012-09-16
3     103      Raj 5600 2014-03-23
4     104   Vishal 1800 2018-01-19
5     105    Seema 4200 2021-05-04
6     106    Deepa  800 2020-07-26
7     107   Pankaj 2300 2019-04-04
8     108    Kamal 2000 2018-05-28
9     109    Leela 1000 2020-12-08
10    110      Ram 1500 2021-02-15


#### Getting the structure of R Data Frame
    In R, we can find the structure of our data frame. R provides an in-build function called str() which returns the data with its complete structure. In below example, we have created a frame using a vector of different data type and extracted the structure of it.

In [4]:
# Get the structure of data frame of emp : use str()
str(emp)

'data.frame':	10 obs. of  4 variables:
 $ emp_id  : int  101 102 103 104 105 106 107 108 109 110
 $ emp_name: chr  "Nitesh" "Rahul" "Raj" "Vishal" ...
 $ sal     : num  5000 4500 5600 1800 4200 800 2300 2000 1000 1500
 $ DOJ     : Date, format: "2014-03-23" "2012-09-16" ...


##### Summary of data in Data Frames
In some cases, it is required to find the statistical summary and nature of the data in the data frame. R provides the summary() function to extract the statistical summary and nature of the data. This function takes the data frame as a parameter and returns the statistical information of the data. Let?s see an example to understand how this function is used in R:

In [5]:
# Print the summary
summary(emp)

     emp_id        emp_name              sal            DOJ            
 Min.   :101.0   Length:10          Min.   : 800   Min.   :2012-09-16  
 1st Qu.:103.2   Class :character   1st Qu.:1575   1st Qu.:2015-03-07  
 Median :105.5   Mode  :character   Median :2150   Median :2018-10-30  
 Mean   :105.5                      Mean   :2870   Mean   :2018-01-30  
 3rd Qu.:107.8                      3rd Qu.:4425   3rd Qu.:2020-11-04  
 Max.   :110.0                      Max.   :5600   Max.   :2021-05-04  

In [6]:
# Extract data from dataframe
# Access empname and salary of all employees
data.frame(emp$sal,emp$emp_name) # Temporary bases we kept the code inside the data.frame

emp.sal,emp.emp_name
<dbl>,<chr>
5000,Nitesh
4500,Rahul
5600,Raj
1800,Vishal
4200,Seema
800,Deepa
2300,Pankaj
2000,Kamal
1000,Leela
1500,Ram


In [7]:
# Second approach to access the data from the data.frame
emp[c("emp_name","sal")]

emp_name,sal
<chr>,<dbl>
Nitesh,5000
Rahul,4500
Raj,5600
Vishal,1800
Seema,4200
Deepa,800
Pankaj,2300
Kamal,2000
Leela,1000
Ram,1500


In [8]:
# Access the first 3 employee details 
emp[1:3,]  # Slicing

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
1,101,Nitesh,5000,2014-03-23
2,102,Rahul,4500,2012-09-16
3,103,Raj,5600,2014-03-23


In [9]:
# Access all details for 1st , 5th and 8 th employee
emp[c(1,5,8),]

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
1,101,Nitesh,5000,2014-03-23
5,105,Seema,4200,2021-05-04
8,108,Kamal,2000,2018-05-28


In [10]:
# Access emp and DOJ of 1st, 5th and 8th employee
emp[c(1,5,8),c("emp_name","DOJ")]

Unnamed: 0_level_0,emp_name,DOJ
Unnamed: 0_level_1,<chr>,<date>
1,Nitesh,2014-03-23
5,Seema,2021-05-04
8,Kamal,2018-05-28


In [11]:
# To Show first 6 records : use head(dataframe)
head(emp) # By defaults

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
1,101,Nitesh,5000,2014-03-23
2,102,Rahul,4500,2012-09-16
3,103,Raj,5600,2014-03-23
4,104,Vishal,1800,2018-01-19
5,105,Seema,4200,2021-05-04
6,106,Deepa,800,2020-07-26


In [12]:
# If we want first 7th records 
head(emp,5)

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
1,101,Nitesh,5000,2014-03-23
2,102,Rahul,4500,2012-09-16
3,103,Raj,5600,2014-03-23
4,104,Vishal,1800,2018-01-19
5,105,Seema,4200,2021-05-04


In [13]:
# To Show the last records 
tail(emp) # By defaults it take 6 records

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
5,105,Seema,4200,2021-05-04
6,106,Deepa,800,2020-07-26
7,107,Pankaj,2300,2019-04-04
8,108,Kamal,2000,2018-05-28
9,109,Leela,1000,2020-12-08
10,110,Ram,1500,2021-02-15


In [14]:
tail(emp,3)

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ
Unnamed: 0_level_1,<int>,<chr>,<dbl>,<date>
8,108,Kamal,2000,2018-05-28
9,109,Leela,1000,2020-12-08
10,110,Ram,1500,2021-02-15


In [15]:
# Expand the dataframe : A dataframe can expanded by adding columns and rows in the existing dataframe
# For example: : suppose we want to add new columns dept_name in existing dataframe emp;
emp$dept_name=c("Training","Sales","Sales","Marketing","Account","HR","Marketing","Training","HR","Marketing")
print(emp)  # it will add at the last of column
# emp["dept_name"] = c("Training","Sales","Sales","Marketing","Account","HR","Marketing","Training","HR","Marketing")

   emp_id emp_name  sal        DOJ dept_name
1     101   Nitesh 5000 2014-03-23  Training
2     102    Rahul 4500 2012-09-16     Sales
3     103      Raj 5600 2014-03-23     Sales
4     104   Vishal 1800 2018-01-19 Marketing
5     105    Seema 4200 2021-05-04   Account
6     106    Deepa  800 2020-07-26        HR
7     107   Pankaj 2300 2019-04-04 Marketing
8     108    Kamal 2000 2018-05-28  Training
9     109    Leela 1000 2020-12-08        HR
10    110      Ram 1500 2021-02-15 Marketing


In [16]:
emp

emp_id,emp_name,sal,DOJ,dept_name
<int>,<chr>,<dbl>,<date>,<chr>
101,Nitesh,5000,2014-03-23,Training
102,Rahul,4500,2012-09-16,Sales
103,Raj,5600,2014-03-23,Sales
104,Vishal,1800,2018-01-19,Marketing
105,Seema,4200,2021-05-04,Account
106,Deepa,800,2020-07-26,HR
107,Pankaj,2300,2019-04-04,Marketing
108,Kamal,2000,2018-05-28,Training
109,Leela,1000,2020-12-08,HR
110,Ram,1500,2021-02-15,Marketing


In [17]:
# Add new rows in existing dataframe emp then use inbuilt function : rbind()
# First create a new dataframe for new joinee
emp_new=data.frame(emp_id=111,emp_name="Jethalal",sal=180000,DOJ=as.Date("2010-10-23"),dept_name="Sales")
print(emp_new)

  emp_id emp_name    sal        DOJ dept_name
1    111 Jethalal 180000 2010-10-23     Sales


In [18]:
# Add 
emp = rbind(emp,emp_new)
print(emp)

   emp_id emp_name    sal        DOJ dept_name
1     101   Nitesh   5000 2014-03-23  Training
2     102    Rahul   4500 2012-09-16     Sales
3     103      Raj   5600 2014-03-23     Sales
4     104   Vishal   1800 2018-01-19 Marketing
5     105    Seema   4200 2021-05-04   Account
6     106    Deepa    800 2020-07-26        HR
7     107   Pankaj   2300 2019-04-04 Marketing
8     108    Kamal   2000 2018-05-28  Training
9     109    Leela   1000 2020-12-08        HR
10    110      Ram   1500 2021-02-15 Marketing
11    111 Jethalal 180000 2010-10-23     Sales


In [19]:
emp

emp_id,emp_name,sal,DOJ,dept_name
<dbl>,<chr>,<dbl>,<date>,<chr>
101,Nitesh,5000,2014-03-23,Training
102,Rahul,4500,2012-09-16,Sales
103,Raj,5600,2014-03-23,Sales
104,Vishal,1800,2018-01-19,Marketing
105,Seema,4200,2021-05-04,Account
106,Deepa,800,2020-07-26,HR
107,Pankaj,2300,2019-04-04,Marketing
108,Kamal,2000,2018-05-28,Training
109,Leela,1000,2020-12-08,HR
110,Ram,1500,2021-02-15,Marketing


In [20]:
# To show all details of those employees whose salary  > 2000
# Use subset() inbuilt function
subset(emp,sal>2000)
# subset(dataframe_name,condition)

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
1,101,Nitesh,5000,2014-03-23,Training
2,102,Rahul,4500,2012-09-16,Sales
3,103,Raj,5600,2014-03-23,Sales
5,105,Seema,4200,2021-05-04,Account
7,107,Pankaj,2300,2019-04-04,Marketing
11,111,Jethalal,180000,2010-10-23,Sales


In [21]:
# To show all details of those employee whose sal > 1800 and sal < 4000 same columns
# use logical operators : and / or
# In r programming :&&(only for 1 record) but we have more than one reocrds use &
subset(emp,(sal>1800) & (sal<4000))

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
7,107,Pankaj,2300,2019-04-04,Marketing
8,108,Kamal,2000,2018-05-28,Training


In [22]:
# To show emp_id,emp_name,DOJ
df = data.frame(emp$emp_name,emp$sal,emp$DOJ)
subset(df,sal>3000)

Unnamed: 0_level_0,emp.emp_name,emp.sal,emp.DOJ
Unnamed: 0_level_1,<chr>,<dbl>,<date>
1,Nitesh,5000,2014-03-23
2,Rahul,4500,2012-09-16
3,Raj,5600,2014-03-23
5,Seema,4200,2021-05-04
11,Jethalal,180000,2010-10-23


In [23]:
# update the records 
# If we want to change the sal of 2nd employee
emp[2,c("sal")]=20000

In [24]:
emp

emp_id,emp_name,sal,DOJ,dept_name
<dbl>,<chr>,<dbl>,<date>,<chr>
101,Nitesh,5000,2014-03-23,Training
102,Rahul,20000,2012-09-16,Sales
103,Raj,5600,2014-03-23,Sales
104,Vishal,1800,2018-01-19,Marketing
105,Seema,4200,2021-05-04,Account
106,Deepa,800,2020-07-26,HR
107,Pankaj,2300,2019-04-04,Marketing
108,Kamal,2000,2018-05-28,Training
109,Leela,1000,2020-12-08,HR
110,Ram,1500,2021-02-15,Marketing


In [25]:
# To delete the columns=
# Suppose deptname column from existing dataframe
emp[,-5]                   # Temproary

emp_id,emp_name,sal,DOJ
<dbl>,<chr>,<dbl>,<date>
101,Nitesh,5000,2014-03-23
102,Rahul,20000,2012-09-16
103,Raj,5600,2014-03-23
104,Vishal,1800,2018-01-19
105,Seema,4200,2021-05-04
106,Deepa,800,2020-07-26
107,Pankaj,2300,2019-04-04
108,Kamal,2000,2018-05-28
109,Leela,1000,2020-12-08
110,Ram,1500,2021-02-15


In [26]:
# Tos remove 4th row
emp[-4,]  # Temproary

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
1,101,Nitesh,5000,2014-03-23,Training
2,102,Rahul,20000,2012-09-16,Sales
3,103,Raj,5600,2014-03-23,Sales
5,105,Seema,4200,2021-05-04,Account
6,106,Deepa,800,2020-07-26,HR
7,107,Pankaj,2300,2019-04-04,Marketing
8,108,Kamal,2000,2018-05-28,Training
9,109,Leela,1000,2020-12-08,HR
10,110,Ram,1500,2021-02-15,Marketing
11,111,Jethalal,180000,2010-10-23,Sales


In [27]:
# permanent delete
emp = emp[-4,]
emp

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
1,101,Nitesh,5000,2014-03-23,Training
2,102,Rahul,20000,2012-09-16,Sales
3,103,Raj,5600,2014-03-23,Sales
5,105,Seema,4200,2021-05-04,Account
6,106,Deepa,800,2020-07-26,HR
7,107,Pankaj,2300,2019-04-04,Marketing
8,108,Kamal,2000,2018-05-28,Training
9,109,Leela,1000,2020-12-08,HR
10,110,Ram,1500,2021-02-15,Marketing
11,111,Jethalal,180000,2010-10-23,Sales


In [28]:
# To show those records whose dept_name = "Sales"
subset(emp,dept_name=="Sales")

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
2,102,Rahul,20000,2012-09-16,Sales
3,103,Raj,5600,2014-03-23,Sales
11,111,Jethalal,180000,2010-10-23,Sales


In [29]:
emp

Unnamed: 0_level_0,emp_id,emp_name,sal,DOJ,dept_name
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>,<date>,<chr>
1,101,Nitesh,5000,2014-03-23,Training
2,102,Rahul,20000,2012-09-16,Sales
3,103,Raj,5600,2014-03-23,Sales
5,105,Seema,4200,2021-05-04,Account
6,106,Deepa,800,2020-07-26,HR
7,107,Pankaj,2300,2019-04-04,Marketing
8,108,Kamal,2000,2018-05-28,Training
9,109,Leela,1000,2020-12-08,HR
10,110,Ram,1500,2021-02-15,Marketing
11,111,Jethalal,180000,2010-10-23,Sales
