Chapter 14: Integrating Suggestions, Restructure Sections, and Update Exercise#201
Chapter 14: Integrating Suggestions, Restructure Sections, and Update Exercise#201
Conversation
This reverts commit a9648a5.
| A `DataFrame` is an object for storing related columns of data. | ||
|
|
||
| Let's start with Series | ||
| We begin with creating a series with four random observations |
There was a problem hiding this comment.
We begin by creating a series with
| ### Select Data by Position | ||
|
|
||
| In practice, one thing that we do all the time with a dataframe is we want to find, select and work with a subset of the data of our interests. | ||
|
|
|
|
||
| Real world datasets can be [enoumous](https://developers.google.com/machine-learning/data-prep/construct/collect/data-size-quality). | ||
|
|
||
| It is sometimes desirable to work with a subset of data for computational efficiency and reduce redundancy to save space. |
There was a problem hiding this comment.
The grammar in this sentence is not quite right. Maybe "...to enhance computational efficiency, reduce redundancy and save space"?
|
|
||
| This function can be some built-in functions like `max`, a `lambda` function, or user-defined function. | ||
|
|
||
| Here is an example using `max` function |
|
|
||
| `df.apply()` here returns a series of boolean values rows that satisfies the condition specified in the if-else statement. | ||
|
|
||
| In addition, it also defined a subset of variables of interest. |
| df | ||
| ``` | ||
|
|
||
| 3. We can use `.apply()` method to modify by rows/columns as a whole |
| df.apply(update_row, axis=1) | ||
| ``` | ||
|
|
||
| 4. We can use `.applymap()` method to modify all individual entries in the dataframe altogether. |
| df | ||
| ``` | ||
|
|
||
| `zip` function here creates pairs of values at the corresponding position of the two lists (i.e. [0,3], [3,4] ...) |
There was a problem hiding this comment.
Here, the zip function creates
| df.applymap(replace_nan) | ||
| ``` | ||
|
|
||
| Pandas also provides us with convinient methods to replace missing values |
|
Nice work @HumphreyYang , many thanks. Please see the comments above. If you can, please find a way to check spelling and grammar. (One way to get both right is to right very short, simple sentences. You are doing well, please keep pushing in this direction.) @mmcky , I'll leave you to review and merge from here. Thanks. |
|
Hi @jstac, Hi @mmcky, |
| "Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","5.108067988" | ||
| ``` | ||
| Let's look at an example that reads data from the CSV file `pandas/data/test_pwt.csv`, which is taken from the [Penn World Tables](https://www.rug.nl/ggdc/productivity/pwt/pwt-releases/pwt-7.0). | ||
|
|
There was a problem hiding this comment.
@HumphreyYang thanks for linking this to the PWT v7.0. It might be worth adding a small table with the variable names and the description to make it crystal clear. What do you think?
mmcky
left a comment
There was a problem hiding this comment.
thanks @HumphreyYang this is nice work. Thanks @jstac for your comments also.
I have just made one comment re: defining the variable names in the Penn world tables data sample to make that even clearer. We were confused when we talked on Zoom so it's probably a good thing to do.
Once that is added I'll send a copy to Tom for comments.
Hi @mmcky, Thank you for your comment. I have added the table to the latest commit. I omitted details such as "at current price" and "G-K method" since it is only for readers to have an idea about what these variables are. Do you think I should put these details in? Thank you. |
|
Just the meaning of the variable is the right call. Thanks @HumphreyYang |
Thanks @mmcky. Could you please kindly review the latest deployment and inform me if there is anything I need to mention or change? |
Hi @jstac and @mmcky ,
In this PR on pandas lecture, I have performed the following tasks:
During the revision, I found that it is possible to further expand our discussion on pandas. For example, I felt that content on data wrangling and data analysis based on pandas dataframe can be further discussed in a separate chapter in future iterations.
Could you please kindly review these changes and provide some feedback on this version?