# Chapter 4: Working with Data – Collections

In the previous chapters, you learned how to store individual pieces of data using variables of various types. But what happens when you need to work with a group of related items? For example, a list of student names, a set of temperatures recorded throughout the day, or a mapping of product IDs to prices. Storing each item in separate variables would be impractical and error‑prone.

This is where **collections** come in. Collections are data structures designed to hold multiple elements, allowing you to add, remove, find, and iterate over them efficiently. In this chapter, you’ll explore the most important collection types in C#:

- **Arrays** – the simplest collection, with a fixed size.
- **Generic collections** like `List<T>` and `Dictionary<TKey, TValue>` – flexible, type‑safe, and dynamically resizable.
- An introduction to **LINQ (Language Integrated Query)** – a powerful set of methods that lets you query collections in a declarative way.
- The `IEnumerable` interface – the foundation that enables `foreach` loops and LINQ.

By the end, you’ll be able to store, manage, and query groups of data effectively.

---

## 4.1 Arrays – The Foundation

An **array** is a fixed‑size, ordered collection of elements, all of the same type. Arrays are fundamental and appear in almost every language.

### Declaring and Initializing Arrays

You declare an array by placing square brackets after the element type.

```csharp
int[] numbers;               // declaration – no memory allocated yet
numbers = new int[5];        // initialization – creates an array of 5 integers, all default (0)
```

You can combine declaration and initialization:

```csharp
int[] numbers = new int[5];       // all elements are 0
string[] names = new string[3];   // all elements are null (strings are reference types)
```

You can also initialize an array with values using an array initializer:

```csharp
int[] primes = new int[] { 2, 3, 5, 7, 11 };
// or even shorter:
int[] primes = { 2, 3, 5, 7, 11 };
```

The compiler infers the array type from the elements.

### Accessing Array Elements

Elements are accessed by a zero‑based index using square brackets.

```csharp
int[] numbers = { 10, 20, 30, 40, 50 };
Console.WriteLine(numbers[0]);   // 10
Console.WriteLine(numbers[2]);   // 30

numbers[1] = 99;                  // change the second element
Console.WriteLine(numbers[1]);   // 99
```

If you try to access an index outside the valid range (e.g., `numbers[5]` in a 5‑element array), you'll get an `IndexOutOfRangeException`.

### Array Length

The `Length` property tells you how many elements the array holds.

```csharp
int[] numbers = { 10, 20, 30 };
for (int i = 0; i < numbers.Length; i++)
{
    Console.WriteLine(numbers[i]);
}
```

### Iterating Over Arrays

You already know the `for` loop. But arrays also work perfectly with `foreach`:

```csharp
int[] numbers = { 10, 20, 30 };
foreach (int num in numbers)
{
    Console.WriteLine(num);
}
```

`foreach` is often more readable and eliminates off‑by‑one errors. However, you cannot modify the array elements inside a `foreach` loop (the iteration variable is read‑only). For modification, use a `for` loop.

### Multi‑Dimensional Arrays

C# supports two kinds of multi‑dimensional arrays: **rectangular** and **jagged**.

#### Rectangular Arrays

A rectangular array has multiple dimensions, and each dimension has a fixed length. They are declared using commas inside the square brackets.

```csharp
// 2‑dimensional array (matrix) with 3 rows and 4 columns
int[,] matrix = new int[3, 4];

// Initialize with values
int[,] matrix2 = {
    { 1, 2, 3, 4 },
    { 5, 6, 7, 8 },
    { 9, 10, 11, 12 }
};

// Access elements
int value = matrix2[1, 2]; // row 1, column 2 -> 7
matrix2[0, 0] = 99;
```

The `GetLength(dimension)` method returns the length of a specific dimension.

```csharp
for (int i = 0; i < matrix2.GetLength(0); i++)
{
    for (int j = 0; j < matrix2.GetLength(1); j++)
    {
        Console.Write($"{matrix2[i, j]} ");
    }
    Console.WriteLine();
}
```

#### Jagged Arrays

A jagged array is an array of arrays. Each sub‑array can have a different length. They are declared with multiple square brackets.

```csharp
int[][] jagged = new int[3][]; // 3 rows, but columns are not yet defined

jagged[0] = new int[] { 1, 2 };
jagged[1] = new int[] { 3, 4, 5 };
jagged[2] = new int[] { 6 };

Console.WriteLine(jagged[1][2]); // 5
```

Jagged arrays are more flexible but slightly more complex to work with.

### Useful Array Methods

The `Array` class provides static methods for common operations:

```csharp
int[] numbers = { 5, 2, 8, 1, 9 };
Array.Sort(numbers);          // sorts in place: { 1, 2, 5, 8, 9 }
Array.Reverse(numbers);        // reverses: { 9, 8, 5, 2, 1 }
int index = Array.IndexOf(numbers, 5); // finds index of 5 -> 2
```

**When to use arrays:** Arrays are best when you have a fixed number of elements and you need maximum performance or interoperability with other languages. For most everyday scenarios, the more flexible collections described next are preferred.

---

## 4.2 The `foreach` Loop Deep Dive

You've seen `foreach` in action. Now let's understand why it works with arrays and other collections. The `foreach` loop is syntactic sugar over the **enumerator pattern**. Any type that implements the `IEnumerable` interface (or its generic counterpart `IEnumerable<T>`) can be used in a `foreach` loop. The compiler translates:

```csharp
foreach (int num in numbers)
{
    Console.WriteLine(num);
}
```

into something like:

```csharp
IEnumerator<int> enumerator = numbers.GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        int num = enumerator.Current;
        Console.WriteLine(num);
    }
}
finally
{
    enumerator.Dispose();
}
```

The key point: `foreach` works with any collection that exposes a `GetEnumerator` method. All the collections we'll discuss (arrays, `List<T>`, `Dictionary<TKey, TValue>`) support this.

**Important:** You cannot modify the collection (add, remove, replace elements) inside a `foreach` loop. Doing so will invalidate the enumerator and throw an `InvalidOperationException`. If you need to modify while iterating, use a `for` loop or iterate over a copy.

---

## 4.3 Generic Collections: `List<T>` and `Dictionary<TKey, TValue>`

The `System.Collections.Generic` namespace contains a wealth of collection classes that are **type‑safe** and **dynamic**. Unlike arrays, they grow and shrink as needed.

### `List<T>` – The Dynamic Array

`List<T>` is the most commonly used collection. It behaves like an array but can resize automatically.

#### Creating and Adding Elements

```csharp
using System.Collections.Generic; // typically already available

List<string> names = new List<string>();
names.Add("Alice");
names.Add("Bob");
names.Add("Charlie");

// You can also initialize with a collection initializer
List<int> scores = new List<int> { 95, 87, 100, 72 };
```

#### Accessing Elements

You can access elements by index, just like an array:

```csharp
string firstName = names[0]; // "Alice"
names[1] = "Robert";          // change "Bob" to "Robert"
```

The `Count` property gives the number of elements.

```csharp
Console.WriteLine($"There are {names.Count} names.");
```

#### Inserting and Removing

```csharp
List<string> names = new List<string> { "Alice", "Bob", "Charlie" };

names.Insert(1, "David");     // inserts "David" at index 1
names.Remove("Bob");           // removes the first occurrence of "Bob"
names.RemoveAt(0);             // removes element at index 0
names.Clear();                  // removes all elements
```

#### Searching

```csharp
bool hasAlice = names.Contains("Alice");
int index = names.IndexOf("Charlie"); // returns -1 if not found
```

#### Iterating

You can use `for` or `foreach`:

```csharp
foreach (string name in names)
{
    Console.WriteLine(name);
}

for (int i = 0; i < names.Count; i++)
{
    Console.WriteLine(names[i]);
}
```

#### Capacity and Performance

`List<T>` internally uses an array to store elements. When you add an element and the internal array is full, the list allocates a new, larger array (typically doubling the capacity) and copies the elements over. This is efficient for most scenarios, but if you know the final size in advance, you can set the capacity to avoid resizes:

```csharp
List<int> numbers = new List<int>(100); // initial capacity of 100
```

You can also trim excess capacity with `numbers.TrimExcess()`.

### `Dictionary<TKey, TValue>` – Key‑Value Pairs

A dictionary stores a collection of **key‑value pairs**, where each key is unique and used to look up the associated value. This is ideal for scenarios like looking up a product by its ID, or counting word frequencies.

#### Creating and Adding

```csharp
Dictionary<string, int> ages = new Dictionary<string, int>();
ages.Add("Alice", 30);
ages.Add("Bob", 25);
ages["Charlie"] = 35; // using indexer also adds if key doesn't exist
```

You can also use a collection initializer:

```csharp
Dictionary<string, int> ages = new Dictionary<string, int>
{
    { "Alice", 30 },
    { "Bob", 25 },
    { "Charlie", 35 }
};
```

#### Accessing Values

Use the indexer with the key:

```csharp
int aliceAge = ages["Alice"]; // 30
```

If the key doesn't exist, the indexer throws a `KeyNotFoundException`. To safely attempt retrieval, use `TryGetValue`:

```csharp
if (ages.TryGetValue("David", out int age))
{
    Console.WriteLine($"David is {age} years old.");
}
else
{
    Console.WriteLine("David not found.");
}
```

#### Checking for a Key

```csharp
bool hasAlice = ages.ContainsKey("Alice");
bool hasAge30 = ages.ContainsValue(30); // slower, checks all values
```

#### Iterating Over a Dictionary

You can iterate over the keys, the values, or the key‑value pairs (as `KeyValuePair<TKey, TValue>`).

```csharp
foreach (KeyValuePair<string, int> pair in ages)
{
    Console.WriteLine($"{pair.Key} is {pair.Value} years old.");
}

// Or use var for brevity
foreach (var pair in ages)
{
    Console.WriteLine($"{pair.Key} is {pair.Value} years old.");
}

// Just keys
foreach (string name in ages.Keys)
{
    Console.WriteLine(name);
}

// Just values
foreach (int age in ages.Values)
{
    Console.WriteLine(age);
}
```

#### Removing

```csharp
ages.Remove("Bob"); // removes the entry with key "Bob"
```

### Other Useful Collections

- `HashSet<T>` – an unordered collection of unique elements. Fast for membership tests.
- `Queue<T>` – first‑in, first‑out (FIFO) collection.
- `Stack<T>` – last‑in, first‑out (LIFO) collection.
- `LinkedList<T>` – doubly linked list (less common, but useful for frequent insertions/removals in the middle).

We'll focus on `List<T>` and `Dictionary<TKey, TValue>` as they are the most widely used.

---

## 4.4 Introduction to LINQ (Language Integrated Query)

**LINQ** is one of the most powerful features in C#. It allows you to write declarative queries against collections (and many other data sources) using a syntax that resembles SQL or functional programming constructs. LINQ methods are part of the `System.Linq` namespace.

### Setting Up

To use LINQ, you need to include the namespace:

```csharp
using System.Linq;
```

Most LINQ methods are extension methods that work on any type implementing `IEnumerable<T>` – which includes arrays, `List<T>`, `Dictionary<TKey, TValue>` (though dictionary returns key‑value pairs), and many others.

### Key LINQ Methods

We'll cover the most essential ones: `Where`, `Select`, and `First`/`FirstOrDefault`.

#### `Where` – Filtering

`Where` returns a new collection containing only the elements that satisfy a given condition (a predicate).

```csharp
List<int> numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

// Get all even numbers
IEnumerable<int> evens = numbers.Where(n => n % 2 == 0);

foreach (int even in evens)
{
    Console.WriteLine(even); // 2,4,6,8,10
}
```

The lambda expression `n => n % 2 == 0` is a function that takes an integer `n` and returns `true` if it's even. LINQ methods accept such predicates.

#### `Select` – Projection

`Select` transforms each element in the collection into a new form. For example, you might want to extract a property or compute a new value.

```csharp
List<string> names = new List<string> { "alice", "bob", "charlie" };

// Get the lengths of each name
IEnumerable<int> nameLengths = names.Select(name => name.Length);

foreach (int len in nameLengths)
{
    Console.WriteLine(len); // 5,3,7
}

// Convert to uppercase
IEnumerable<string> upperNames = names.Select(name => name.ToUpper());
```

You can also project to anonymous types (useful when you need a few properties):

```csharp
var nameInfo = names.Select(name => new { Name = name, Length = name.Length });
foreach (var info in nameInfo)
{
    Console.WriteLine($"{info.Name} has {info.Length} letters.");
}
```

#### `First` and `FirstOrDefault` – Getting a Single Element

- `First()` returns the first element of a collection. If the collection is empty, it throws `InvalidOperationException`.
- `FirstOrDefault()` returns the first element or the default value for the type (e.g., `null` for reference types, `0` for numeric types) if the collection is empty.

You can also pass a predicate to get the first element matching a condition.

```csharp
List<int> numbers = new List<int> { 10, 20, 30, 40 };

int first = numbers.First();                // 10
int firstEven = numbers.First(n => n % 2 == 0); // 10 (first even)
int firstOver100 = numbers.FirstOrDefault(n => n > 100); // 0 (default int)
```

For dictionaries, you might use these methods on the `Keys` or `Values` collections, but it's more common to use `TryGetValue` for key lookups.

### Query Syntax vs. Method Syntax

LINQ provides two forms: **method syntax** (the one we just used, with extension methods) and **query syntax** (which resembles SQL). Query syntax is translated by the compiler into method calls. Here's an example of query syntax:

```csharp
var evens = from n in numbers
            where n % 2 == 0
            select n;
```

This is equivalent to `numbers.Where(n => n % 2 == 0).Select(n => n)` (the `Select` is implicit in query syntax if you just select the element). Most developers use method syntax because it's more consistent with the rest of C#, but query syntax can be more readable for complex queries. We'll stick to method syntax in this book.

### Deferred Execution

An important concept in LINQ is **deferred execution**: many LINQ methods (like `Where` and `Select`) do not execute immediately. They return an `IEnumerable<T>` that represents the query, and the actual work is delayed until you iterate over the results (e.g., with `foreach` or by calling `ToList()`). This can improve performance and allows you to compose queries dynamically.

```csharp
List<int> numbers = new List<int> { 1, 2, 3, 4 };
var query = numbers.Where(n => n % 2 == 0); // query not executed yet

numbers.Add(5);
numbers.Add(6); // we modify the source

foreach (int even in query) // execution happens here
{
    Console.WriteLine(even); // 2,4,6 – includes the newly added 6
}
```

If you want to force immediate execution, you can convert the result to a list or array:

```csharp
List<int> evens = numbers.Where(n => n % 2 == 0).ToList();
```

Now `evens` is a separate list, and subsequent changes to `numbers` won't affect it.

### Combining LINQ Methods

You can chain LINQ methods to build powerful queries:

```csharp
List<string> names = new List<string> { "Alice", "Bob", "Charlie", "David", "Eve" };

// Get names longer than 3 characters, in uppercase, sorted
var result = names
    .Where(name => name.Length > 3)
    .Select(name => name.ToUpper())
    .OrderBy(name => name);

foreach (string name in result)
{
    Console.WriteLine(name); // ALICE, CHARLIE, DAVID
}
```

### LINQ with Dictionaries

LINQ works with dictionaries as well. A dictionary is an `IEnumerable<KeyValuePair<TKey, TValue>>`, so you can query over the pairs.

```csharp
Dictionary<string, int> ages = new Dictionary<string, int>
{
    { "Alice", 30 },
    { "Bob", 25 },
    { "Charlie", 35 }
};

// Find all people older than 28
var olderThan28 = ages.Where(pair => pair.Value > 28)
                      .Select(pair => pair.Key);

foreach (string name in olderThan28)
{
    Console.WriteLine(name); // Alice, Charlie
}
```

Or you can query just the keys or values:

```csharp
IEnumerable<string> youngPeople = ages.Keys.Where(name => ages[name] < 30);
```

---

## 4.5 Putting It All Together: A Practical Example

Let's build a small program that reads a list of students and their grades, stores them in a dictionary, and then performs various queries using LINQ.

```csharp
using System;
using System.Collections.Generic;
using System.Linq;

namespace StudentGrades
{
    class Program
    {
        static void Main(string[] args)
        {
            // Store student grades in a dictionary
            Dictionary<string, int> studentGrades = new Dictionary<string, int>();

            // Input loop
            while (true)
            {
                Console.Write("Enter student name (or 'quit' to finish): ");
                string? name = Console.ReadLine();
                if (string.IsNullOrWhiteSpace(name) || name.ToLower() == "quit")
                    break;

                Console.Write($"Enter grade for {name}: ");
                if (int.TryParse(Console.ReadLine(), out int grade))
                {
                    // Add or update the grade
                    studentGrades[name] = grade;
                }
                else
                {
                    Console.WriteLine("Invalid grade. Try again.");
                }
            }

            // Display all students and grades
            Console.WriteLine("\n--- All Students ---");
            foreach (var pair in studentGrades)
            {
                Console.WriteLine($"{pair.Key}: {pair.Value}");
            }

            // LINQ queries
            if (studentGrades.Any()) // Any checks if collection has elements
            {
                // Average grade
                double average = studentGrades.Values.Average();
                Console.WriteLine($"\nAverage grade: {average:F2}");

                // Highest grade
                int highest = studentGrades.Values.Max();
                var topStudents = studentGrades.Where(p => p.Value == highest)
                                               .Select(p => p.Key);
                Console.WriteLine($"Highest grade: {highest} achieved by {string.Join(", ", topStudents)}");

                // Students with grade above average
                var aboveAverage = studentGrades.Where(p => p.Value > average)
                                                .Select(p => p.Key);
                Console.WriteLine($"Students above average: {string.Join(", ", aboveAverage)}");

                // Students ordered by name
                Console.WriteLine("\nStudents ordered by name:");
                foreach (var name in studentGrades.Keys.OrderBy(n => n))
                {
                    Console.WriteLine(name);
                }

                // Students ordered by grade (descending)
                Console.WriteLine("\nStudents ordered by grade (highest first):");
                var byGrade = studentGrades.OrderByDescending(p => p.Value);
                foreach (var pair in byGrade)
                {
                    Console.WriteLine($"{pair.Key}: {pair.Value}");
                }
            }
            else
            {
                Console.WriteLine("No students entered.");
            }

            Console.WriteLine("\nPress any key to exit...");
            Console.ReadKey();
        }
    }
}
```

**Explanation:**

- We use a `Dictionary<string, int>` to store student names and grades.
- A `while` loop repeatedly prompts for input until the user types "quit".
- We use `studentGrades[name] = grade;` to add or update an entry. The indexer adds the key if it doesn't exist.
- After input, we use various LINQ methods:
  - `Any()` checks if the dictionary is non‑empty.
  - `Values.Average()` computes the average of all grades.
  - `Values.Max()` finds the maximum grade.
  - `Where` with a predicate to find all students with that grade.
  - `Select` to extract just the names.
  - `string.Join` to concatenate names into a single string.
  - `Keys.OrderBy` to sort the student names alphabetically.
  - `OrderByDescending` on the key‑value pairs to sort by grade.

This example demonstrates real‑world usage: collecting data, storing it in an appropriate collection, and then querying it with LINQ.

---

## 4.6 Common Pitfalls and Best Practices

### 1. Choosing the Right Collection

- Use `List<T>` for an ordered list of items that you'll access by index or iterate over.
- Use `Dictionary<TKey, TValue>` for fast lookups by a unique key.
- Use `HashSet<T>` when you need to store unique items and test membership.
- Use arrays when the size is absolutely fixed and you need maximum performance or interop.

### 2. Null Reference Exceptions with Collections

Collections can be `null`. Always ensure they are initialized before use. Prefer initializing at declaration:

```csharp
private List<string> names = new List<string>(); // never null
```

### 3. Modifying Collections During Iteration

As mentioned, you cannot add or remove items from a collection while iterating with `foreach`. If you need to modify, either:

- Iterate backwards with a `for` loop.
- Create a copy of the collection to iterate over (e.g., `foreach (var item in list.ToList())`).

### 4. LINQ Performance

LINQ is convenient, but it can hide performance costs. For example, calling `Count()` on an `IEnumerable` that is not a collection may enumerate the entire sequence. Prefer using the `Count` property of `List<T>` or `Array` when available. Also, be mindful of multiple enumerations: if you write `if (query.Any()) ... foreach (var x in query) ...`, the query is executed twice. Consider materializing with `ToList()` if you need to reuse the results.

### 5. Dictionary Key Lookup

Always use `TryGetValue` when you're not sure if a key exists. The indexer throws an exception, which can be costly. `TryGetValue` is efficient and gives you both existence check and value in one operation.

### 6. Using `var` with LINQ

It's common to use `var` for LINQ results because the types can be complex (e.g., `IEnumerable<AnonymousType>`). However, be cautious when the result is materialized to a concrete type – it's still clear enough.

---

## 4.7 Chapter Summary

In this chapter, you've learned how to work with groups of data using C# collections:

- **Arrays** – fixed‑size, zero‑based indexed collections, useful for simple scenarios.
- **`List<T>`** – dynamic, resizable list, the go‑to collection for most situations.
- **`Dictionary<TKey, TValue>`** – key‑value pairs for fast lookups.
- **`foreach` loop** – the idiomatic way to iterate over collections, and how it works under the hood with `IEnumerable`.
- **LINQ** – a declarative way to query collections using methods like `Where`, `Select`, `FirstOrDefault`, and many others. You learned about deferred execution and how to compose queries.
- A practical example tying everything together.

With these tools, you can now handle data in bulk – filter, transform, aggregate, and sort. Collections and LINQ are used constantly in real‑world C# development, so mastering them is a major step forward.

In the next chapter, **Methods – The Building Blocks of Logic**, we'll explore how to encapsulate code into reusable units, pass data in and out, and design clean, modular programs. You'll learn about method signatures, parameters (including `ref`, `out`, `in`), overloading, and more. Get ready to elevate your code organisation!

**Exercises:**

1. Write a program that stores a list of integers entered by the user. When the user enters 0, stop and display the sum, average, and maximum of the numbers (using LINQ).
2. Create a dictionary that maps country codes (e.g., "us", "fr") to country names. Write a loop that asks the user for a code and prints the corresponding country, handling unknown codes gracefully.
3. Given an array of strings, use LINQ to find all strings that start with a vowel (a, e, i, o, u) and convert them to uppercase.
4. Write a method that takes a `List<int>` and returns a new list with only the unique numbers (hint: use `Distinct` LINQ method).
5. Build a simple word counter: read a sentence from the user, split it into words, and use a dictionary to count how many times each word appears. Display the results.

Now, onward to Chapter 5!

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='3. control_flow_in_action.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='5. methods_the_building_blocks_of_logic.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
