### Group By Having and Count

Now that you can select raw data, you're ready to learn **how to group** your data and **count things** within those groups.

This can help you answer questions like:

- How many of each kind of fruit has our store sold?
- How many species of animal has the vet office treated?

##### COUNT
- Returns a count of things. If you pass it the name of a column, it will return the number of entries in that column.
- COUNT() is an example of an aggregate function, which takes many values and returns one.

##### GROUP BY
- It takes the name of one or more columns, and **treats all rows with the same value in that column as a single group** when you apply aggregate functions


##### GROUP BY ... HAVING
- HAVING is used in combination with GROUP BY to ignore groups that don't meet certain criteria.

Here is an example. Suppose this table:

|  ID   |        Name        | Animal |
| :---: | :----------------: | :----: |
|   1   | Dr. Harris Bonkers | Rabbit |
|   2   |        Moon        |  Dog   |
|   3   |       Ripley       |  Cat   |
|   4   |        Tom         |  Cat   |

In [1]:
# First, you can SELECT directly from a COUNT command:
# It means that COUNT adds a new colum
QUERY = """
    SELECT COUNT(ID)
    FROM 'bigquery-public-data.pet_records.pets'
"""
# It's going to return something like this
response = {
    'f0_': 4
}

In [2]:
# And here is an example with all techniques together

# For example, say we want to know how many of each type of animal
# we have in the pets table.

# We can use GROUP BY to group together rows that have the same
# value in the Animal column, while using COUNT() to find out
# how many ID's we have in each group with value > 1 using HAVING.

QUERY = """
    SELECT Animal COUNT(ID)
    FROM 'bigquery-public-data.pet_records.pets'
    GROUP BY Animal
    HAVING COUNT(ID) > 1
"""
# Other animals have just 1 records.
response = {
    'Animal': 'Cat',
    'f0_': 2
}

##### Important ⚠️

Let's consider this two queries.

- The first has two variables: **parent** and **id**
- The second has three variables: **author**, **parent** and **id**

In [3]:
query_good = """
    SELECT parent, COUNT(id)
    FROM `bigquery-public-data.hacker_news.comments`
    GROUP BY parent
"""
query_bad = """
    SELECT author, parent, COUNT(id)
    FROM `bigquery-public-data.hacker_news.comments`
    GROUP BY parent
"""

The second one is bad because **author** is not beign assigned either GROUP BY or COUNT commands.

From kaggle https://www.kaggle.com/dansbecker/group-by-having-count#Note-on-using-GROUP-BY

Note on using **GROUP BY**

Note that because it tells SQL how to apply aggregate functions, it doesn't make sense to use GROUP BY without an aggregate function. Similarly, if you have any GROUP BY clause, then all variables must be passed to either a

- **GROUP BY** command, or
- an **aggregation** function.

##### Example with real dataset