# 6. Data Frame

This section will guide you through the process of creating, manipulating, and efficiently working with data frames.

A **data frame** in R is a table-like data structure used for storing data. It is one of the most commonly used data structures in R for data analysis, as it can hold different types of data (numeric, character, factor, etc.) in a rectangular format with rows and columns. Each column in a data frame can be of a different data type, making it similar to a table or spreadsheet. The rows typically represent observations or records, while the columns represent variables or features.

We'll learn about the following topics:

   - [6.1. Creating Data Frames](#Creating_DataFrame)
   - [6.2. Data Frame Indexing and Slicing](#DataFrame_Indexing_and_Slicing)
   - [6.3. Built-in Data Frame Functions](#Builtin_DataFrame_Functions)
   - [6.4. Data Frame Properties](#DataFrame_Properties)

<p align="center">
  <img width="700" height="400" src="https://media.geeksforgeeks.org/wp-content/uploads/20200414224825/f115.png">
</p>

<a name='Creating_DataFrames'></a>

## 6.1. Creating Data Frames:

To create a data frame in R, you use the `data.frame()` function.

In [1]:
#Create a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(30, 35, 25),
  Height = c(165, 175, 170),
  Married = c(TRUE, FALSE, TRUE)
)

#Print the data frame
print(df)

     Name Age Height Married
1   Alice  30    165    TRUE
2     Bob  35    175   FALSE
3 Charlie  25    170    TRUE


To read a data frame from a CSV file in R, you can use the read.csv() function. Here’s the general syntax:

`data_frame_name <- read.csv("path/to/your/file.csv", header = TRUE, sep = ",")
`

- data_frame_name: This is the name you want to assign to your data frame.
"path/to/your/file.csv": Replace this with the actual path to your CSV file.
- header: Logical value indicating whether the first row of the file contains column names. Set to TRUE (default) if your CSV has headers, otherwise set to FALSE.
- sep: Specifies the character that separates the values in the file. The default is a comma (,), but you can change it if your data uses a different separator (e.g., sep = "\t" for tab-separated values)

**Matrix vs. Dataframe**

<table>
  <thead>
    <tr>
      <th>Matrix</th>
      <th>Dataframe</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Collection of data sets arranged in a two-dimensional rectangular organization.</td>
      <td>Stores data tables that contain multiple data types in multiple columns called fields.</td>
    </tr>
    <tr>
      <td>It’s an m*n array with a similar data type.</td>
      <td>It is a list of vectors of equal length. It is a generalized form of a matrix.</td>
    </tr>
    <tr>
      <td>It has a fixed number of rows and columns.</td>
      <td>It has a variable number of rows and columns.</td>
    </tr>
    <tr>
      <td>The data stored in columns can be only of the same data type.</td>
      <td>The data stored must be numeric, character, or factor type.</td>
    </tr>
    <tr>
      <td>Matrix is homogeneous.</td>
      <td>DataFrames are heterogeneous.</td>
    </tr>
  </tbody>
</table>


<a name='DataFrame_Indexing_and_Slicing'></a>

## 6.2. Data Frame Indexing and Slicing:

You can access individual columns, rows, or elements within a data frame using several different methods:

**1. Access by Column Name:**

In [2]:
#Access the "Age" column
df$Age

In [3]:
df[["Name"]]

In [4]:
df[["Name"]][2]

**2. Access by Indexing:**

In [5]:
#Access the first row
df[1, ]

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True


In [6]:
#Access the first column
df[, 1]

n R, you can use negative indexing to exclude specific elements from a data structure, such as a data frame.

In [7]:
#Exclude the first column
df[, -1]

Age,Height,Married
<dbl>,<dbl>,<lgl>
30,165,True
35,175,False
25,170,True


In [8]:
#Access a specific element (row 2, column 3)
df[2, 3]

In [9]:
df[c(2,4), c(1,2)]

Unnamed: 0_level_0,Name,Age
Unnamed: 0_level_1,<chr>,<dbl>
2.0,Bob,35.0
,,


In [10]:
df[2:3, 1:2]

Unnamed: 0_level_0,Name,Age
Unnamed: 0_level_1,<chr>,<dbl>
2,Bob,35
3,Charlie,25


In [11]:
df[df$Age>25, ]

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False


<a name='Builtin_DataFrame_Functions'></a>

## 6.3. Built-in Data Frame Functions:

<table>
  <thead>
    <tr>
      <th>Function</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>as.data.frame()</td>
      <td>Convert a List to a Data Frame.</td>
    </tr>
    <tr>
      <td>str()</td>
      <td>Displays the structure of the data frame, including data types and a preview of each column.</td>
    </tr>
    <tr>
      <td>nrow() and ncol()</td>
      <td>Returns the number of rows and columns in a data frame.</td>
    </tr>
    <tr>
      <td>dim()</td>
      <td>Returns the dimensions of the data frame (number of rows and columns).</td>
    </tr>
    <tr>
      <td>colnames() and rownames()</td>
      <td>Get or set the column and row names of the data frame.</td>
    </tr>
    <tr>
      <td>head() and tail()</td>
      <td>Displays the first few or last few rows of the data frame.</td>
    </tr>
    <tr>
      <td>summary()</td>
      <td>Provides summary statistics for each column in the data frame.</td>
    </tr>
    <tr>
      <td>subset()</td>
      <td>Subsets a data frame based on conditions.</td>
    </tr>
    <tr>
      <td>merge()</td>
      <td>Merges two data frames by common columns or row names.</td>
    </tr>
    <tr>
      <td>rbind() and cbind()</td>
      <td>Binds data frames by rows or columns.</td>
    </tr>
    <tr>
      <td>apply()</td>
      <td>Applies a function to rows or columns of the data frame.</td>
    </tr>
    <tr>
      <td>is.data.frame()</td>
      <td>Checks if an object is a data frame.</td>
    </tr>
    <tr>
      <td>order()</td>
      <td>Sorts a data frame by one or more columns.</td>
    </tr>
    <tr>
      <td>duplicated() and unique()</td>
      <td>Finds duplicate rows or returns unique rows.</td>
    </tr>
    <tr>
      <td>transform()</td>
      <td>Adds new columns or transforms existing columns.</td>
    </tr>
  </tbody>
</table>

**`as.data.frame()`**: Convert a List to a Data Frame.

In [12]:
lst <- list(Name = c("Alice", "Bob"), Age = c(25, 30))

df_lst <- as.data.frame(lst)

print(df_lst)

   Name Age
1 Alice  25
2   Bob  30


**`str()`**: Displays the structure of the data frame, including data types and a preview of each column.

In [13]:
str(df)

'data.frame':	3 obs. of  4 variables:
 $ Name   : chr  "Alice" "Bob" "Charlie"
 $ Age    : num  30 35 25
 $ Height : num  165 175 170
 $ Married: logi  TRUE FALSE TRUE


**`nrow()`** and **`ncol()`**: Returns the number of rows and columns in a data frame.

In [14]:
nrow(df)
ncol(df)

**`dim()`**: Returns the dimensions of the data frame (number of rows and columns).

In [15]:
dim(df)

**`colnames()`** and **`rownames()`**: Get or set the column and row names of the data frame.

In [16]:
colnames(df)
rownames(df)

**`head()`** and **`tail()`**: Displays the first few or last few rows of the data frame.

In [17]:
head(df)  #Returns the first 6 rows (or fewer if the frame is smaller)
tail(df)  #Returns the last 6 rows

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False
3,Charlie,25,170,True


Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False
3,Charlie,25,170,True


**`summary()`**: Provides summary statistics for each column in the data frame.

In [18]:
summary(df)

     Name                Age           Height       Married       
 Length:3           Min.   :25.0   Min.   :165.0   Mode :logical  
 Class :character   1st Qu.:27.5   1st Qu.:167.5   FALSE:1        
 Mode  :character   Median :30.0   Median :170.0   TRUE :2        
                    Mean   :30.0   Mean   :170.0                  
                    3rd Qu.:32.5   3rd Qu.:172.5                  
                    Max.   :35.0   Max.   :175.0                  

**`subset()`**: Subsets a data frame based on conditions.

In [19]:
subset(df, Age > 25)

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False


In [20]:
subset(df, Height != 170)

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False


**`merge()`**: Merges two data frames by common columns or row names.

In [21]:
df1 <- data.frame(ID = c(1, 2), Name = c("Alice", "Bob"))
df2 <- data.frame(ID = c(1, 2), Age = c(25, 30))
merge(df1, df2, by = "ID")

ID,Name,Age
<dbl>,<chr>,<dbl>
1,Alice,25
2,Bob,30


**`rbind()`** and **`cbind()`**: Binds data frames by rows or columns.

In [22]:
df1 <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
df2 <- data.frame(Name = c("Charlie", "David"), Age = c(35, 40))
rbind(df1, df2)

Name,Age
<chr>,<dbl>
Alice,25
Bob,30
Charlie,35
David,40


**`apply()`**: Applies a function to rows or columns of the data frame.

In [23]:
df[, -1]

Age,Height,Married
<dbl>,<dbl>,<lgl>
30,165,True
35,175,False
25,170,True


In [24]:
apply(df['Age'], 2, mean)

**`is.data.frame()`**: Checks if an object is a data frame.

In [25]:
is.data.frame(df)

**`order()`**: Sorts a data frame by one or more columns.

In [26]:
#Sorts the data frame by the Age column
df[order(df$Age), ]

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
3,Charlie,25,170,True
1,Alice,30,165,True
2,Bob,35,175,False


**`duplicated()`** and **`unique()`**: Finds duplicate rows or returns unique rows.

In [27]:
duplicated(df)
unique(df)

Unnamed: 0_level_0,Name,Age,Height,Married
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
1,Alice,30,165,True
2,Bob,35,175,False
3,Charlie,25,170,True


**`transform()`**: Adds new columns or transforms existing columns.

In [28]:
df <- transform(df, AgeInMonths = Age * 12)

In [29]:
df

Name,Age,Height,Married,AgeInMonths
<chr>,<dbl>,<dbl>,<lgl>,<dbl>
Alice,30,165,True,360
Bob,35,175,False,420
Charlie,25,170,True,300


<a name='DataFrame_Properties'></a>

## 6.4. Data Frame Properties:

**Mutability**: efers to the ability to modify an object after it has been created. In the context of data frames (or other data structures like lists, matrices, or vectors), mutability refers to whether the content of the data structure can be changed (e.g., adding, removing, or modifying elements) after it has been created.

- **Add a New Column**:

In [30]:
df$Weight <- c(65, 75, 85)
print(df)

     Name Age Height Married AgeInMonths Weight
1   Alice  30    165    TRUE         360     65
2     Bob  35    175   FALSE         420     75
3 Charlie  25    170    TRUE         300     85


- **Remove a Column**:

In [31]:
df$Weight <- NULL
print(df)

     Name Age Height Married AgeInMonths
1   Alice  30    165    TRUE         360
2     Bob  35    175   FALSE         420
3 Charlie  25    170    TRUE         300


- **Add a New Row**:

In [32]:
new_row <- data.frame(Name = "David", Age = 40, Height = 180, Married = FALSE, AgeInMonths = 480)
df <- rbind(df, new_row)
print(df)

     Name Age Height Married AgeInMonths
1   Alice  30    165    TRUE         360
2     Bob  35    175   FALSE         420
3 Charlie  25    170    TRUE         300
4   David  40    180   FALSE         480
