In [None]:
from pyspark.sql import SparkSession, Row
from pyspark.sql import functions as func

spark = SparkSession.builder().appName("friendsByAge").getOrCreate()

lines = spark.read.option("header","true").option("inferSchema","true").csv("file:///fakeFriends.csv")


people = lines.select(lines.age,lines.friends)

avg_friends = people.groupBy("age").avg("freinds").show()

avg_friends_sorted = people.groupBy("age").avg("friends").sort("age").show()

avg_friends_formatted = people.groupBy("age").agg(func.round(func.avg("friends"),2)).sort("age").show()

avg_friends_aliased_and_formatted = people.groupBy("age").agg(func.round(func.avg("friends"),2).alias("avg_friends")).sort("age").show()

spark.stop()

---

### Code Breakdown

1. **Importing Libraries**:
   ```python
   from pyspark.sql import SparkSession, Row
   from pyspark.sql import functions as func
   ```
   - `SparkSession`: Entry point to use Spark SQL and DataFrame APIs.
   - `Row`: Allows the creation of row objects (not directly used in this code).
   - `functions as func`: Imports PySpark SQL functions, such as `round` and `avg`.

2. **Creating a Spark Session**:
   ```python
   spark = SparkSession.builder().appName("friendsByAge").getOrCreate()
   ```
   - `SparkSession.builder`: Initializes a Spark session.
   - `appName("friendsByAge")`: Names the Spark application "friendsByAge".
   - `getOrCreate()`: Returns an existing Spark session or creates a new one.

3. **Reading the CSV File**:
   ```python
   lines = spark.read.option("header","true").option("inferSchema","true").csv("file:///fakeFriends.csv")
   ```
   - Reads a CSV file located at `file:///fakeFriends.csv`.
   - `option("header", "true")`: Indicates that the first row contains column headers.
   - `option("inferSchema", "true")`: Automatically infers the data types of columns.

4. **Selecting Relevant Columns**:
   ```python
   people = lines.select(lines.age, lines.friends)
   ```
   - Extracts only the `age` and `friends` columns from the dataset.

5. **Calculating Average Friends**:
   ```python
   avg_friends = people.groupBy("age").avg("freinds").show()
   ```
   - Groups data by the `age` column.
   - Calculates the average number of friends for each age using `avg("freinds")` (note: there is a typo in the column name; it should be `"friends"`).

6. **Sorting and Displaying Averages**:
   ```python
   avg_friends_sorted = people.groupBy("age").avg("friends").sort("age").show()
   ```
   - Groups data by `age`.
   - Calculates the average number of friends.
   - Sorts the results by `age` in ascending order before displaying them.

7. **Formatting the Averages**:
   ```python
   avg_friends_formatted = people.groupBy("age").agg(func.round(func.avg("friends"),2)).sort("age").show()
   ```
   - Groups data by `age`.
   - Calculates the average number of friends.
   - Rounds the average to two decimal places using `func.round`.
   - Sorts the results by `age`.

8. **Aliasing and Formatting**:
   ```python
   avg_friends_aliased_and_formatted = people.groupBy("age").agg(func.round(func.avg("friends"),2).alias("avg_friends")).sort("age").show()
   ```
   - Same as the previous step but renames the rounded average column to `avg_friends` using `.alias()`.

9. **Stopping the Spark Session**:
   ```python
   spark.stop()
   ```
   - Closes the Spark session to release resources.

---

### Sample Dataset: `fakeFriends.csv`
```csv
name,age,friends
John,30,150
Doe,20,200
Jane,30,120
Smith,20,250
Alice,40,300
Bob,30,100
```

---

### Step-by-Step Execution

1. **Load the CSV File**:
   ```python
   +----+-----+-------+
   |name| age |friends|
   +----+-----+-------+
   |John|  30 |    150|
   | Doe|  20 |    200|
   |Jane|  30 |    120|
   |Smith| 20 |    250|
   |Alice| 40 |    300|
   | Bob | 30 |    100|
   +----+-----+-------+
   ```

2. **Select Relevant Columns**:
   ```python
   +---+-------+
   |age|friends|
   +---+-------+
   | 30|    150|
   | 20|    200|
   | 30|    120|
   | 20|    250|
   | 40|    300|
   | 30|    100|
   +---+-------+
   ```

3. **Average Friends (with Typo)**:
   ```python
   avg_friends = people.groupBy("age").avg("freinds").show()
   ```
   - Results in an error because `"freinds"` is a typo. Correct it to `"friends"`.

4. **Average Friends (Corrected)**:
   ```python
   +---+------------------+
   |age|       avg(friends)|
   +---+------------------+
   | 20|             225.0|
   | 30|             123.33|
   | 40|             300.0|
   +---+------------------+
   ```

5. **Sorted Averages**:
   ```python
   +---+------------------+
   |age|       avg(friends)|
   +---+------------------+
   | 20|             225.0|
   | 30|             123.33|
   | 40|             300.0|
   +---+------------------+
   ```

6. **Formatted Averages**:
   ```python
   +---+-------------------+
   |age|round(avg(friends),2)|
   +---+-------------------+
   | 20|              225.0|
   | 30|             123.33|
   | 40|              300.0|
   +---+-------------------+
   ```

7. **Aliased and Formatted Averages**:
   ```python
   +---+----------+
   |age|avg_friends|
   +---+----------+
   | 20|     225.0|
   | 30|    123.33|
   | 40|     300.0|
   +---+----------+
   ```

---

### Key Takeaways
1. **DataFrame Operations**: Use `.select()`, `.groupBy()`, `.agg()`, and `.sort()` to manipulate and analyze data.
2. **Error Handling**: Ensure column names are correct (e.g., `"friends"` vs. `"freinds"`).
3. **Formatting**: Use `func.round()` to format numerical results.
4. **Aliasing**: Use `.alias()` to rename columns for better readability.
