Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cs50.h defines type string to be a pointer to a string #163

Closed
rillig opened this issue Apr 14, 2019 · 13 comments
Closed

cs50.h defines type string to be a pointer to a string #163

rillig opened this issue Apr 14, 2019 · 13 comments

Comments

@rillig
Copy link

@rillig rillig commented Apr 14, 2019

The file cs50.h defines the type string as follows:

typedef char *string;

The C standard says in 7.1.1p1:

A string is a contiguous sequence of characters terminated by and including the first null character.
[…]
A pointer to a string is a pointer to its initial (lowest addressed) character.

What this header calls a string is actually a pointer to a string. This can lead to confusion. Especially in teaching, confusion should be avoided as far as possible.

To clarify this situation, the documenting comment should at least explain the reason why this type definition is necessary and useful, and what a string really is.

dmalan added a commit that referenced this issue Apr 15, 2019
@dmalan

This comment has been minimized.

Copy link
Member

@dmalan dmalan commented Apr 15, 2019

Thanks for the citation, though this one I think we'll largely leave as is, though I've added a parenthetical to our comments. As defined, I do think string is a reasonable (and, daresay, commonly accepted) abstraction for a sequence of characters that just so happens to be represented underneath the hood as the address of a char. Indeed, even the man pages for libc use the terms interchangeably.

@dmalan dmalan closed this Apr 15, 2019
@Sebbyastian

This comment has been minimized.

Copy link

@Sebbyastian Sebbyastian commented Jun 20, 2019

This is an example of why I wrote on your subreddit that your lessons teach a poor understanding of the languages... I hope rillig will stop trying to help you, as I've decided, and instead focus on helping those you mislead.

daresay, commonly accepted

... by people who don't know C. This is a good excuse when you want those who actually know C to start mocking you. I wonder if you'd rather be oblivious to the mocking?

The address of a char is not necessarily a contiguous sequence of characters terminated by and including the first null character, hence you are actually seeding confusion.

that just so happens to be represented underneath the hood as the address of a char

... not to mention wchar_t, char16_t and char32_t... no, the only people on this planet are English-speakers, with 8-bit char types...

... and heaven forbid should you write a sequence of characters that terminates at the first '\0' (including the '\0') to a file (i.e. fwrite(fd, strlen(str) + 1, 1, str);)... what's the address for that string? Or is that not a string anymore? See, this is where the confusion arises. Either char * is synonymous with "string" (it isn't), and you don't have words to describe the actual representation... or you have a proper understanding.

Indeed, even the man pages for libc use the terms interchangeably.

OpenGroup (and a version of Arch Linux says:

The strlen() function shall compute the number of bytes in the string to which s points, not including the terminating NUL character.

OpenGroup decides what constitutes POSIX C, which is most likely what you want to try to fit your lessons into...

man7 (and Arch, Debian) says:

The strlen() function calculates the length of the string pointed to by s, excluding the terminating null byte ('\0').

You might find this kind of language clearer.

I think you're probably using an old Ubuntu Linux installation, or something else that isn't quite POSIX compliant, for your documentation... even Ubuntu is starting to pick up on this mistake, though to be clear they've had POSIX-compliant manuals for years now (1, 2, 3, 4, 6, 7)... from this line of research it seems to me like in the latter versions of Ubuntu they've patched all their documentation (1, 2, 3); you should start to see the correct language in your manpages starting with 18.04. It'd be a shame if everyone else moved on from this mistake, except for you... a shame for you and your students.

@rillig

This comment has been minimized.

Copy link
Author

@rillig rillig commented Jun 20, 2019

@Sebbyastian I totally agree. I only don't understand why I should only repair the damage done by these ignorant teachers, instead of fixing the knowledge of the teachers. The latter is a one-time operation, and I think it is more effective that correcting 1000 Stack Overflow users each year, who wrongly trusted their teachers. O(1) versus O(fib(years passed by)). :)

@Sebbyastian

This comment has been minimized.

Copy link

@Sebbyastian Sebbyastian commented Jun 21, 2019

@rillig In that case, it would make more sense to start persuading universities to hire professors who actually know C to teach C (or hey, here's a better idea, just convince them to back away from C)... this way you turn an operation that occupies O(n m) into an O(n) operation, right? That is, assuming the operation that is persuasion is O(1) in time (that's a whole other problem which we need to deal with as a society).

I will give him one thing... at least he's humble enough to close the issue. Some of the professors on our planet would no doubt be adamant that the lines of code aren't necessary... he he... hopefully he starts looking at ways to fix his C-related resource (or eliminate it from the course entirely).

@Sebbyastian

This comment has been minimized.

Copy link

@Sebbyastian Sebbyastian commented Jun 21, 2019

Hell, he could teach C++ instead. There are introductory books written for C++, by the founding forefather at that... where-as no such textbooks exist for C. This ought to be a big hint... C is not particularly introductory-friendly.

typedef char *string only pushes the issue of serialisation and deserialisation of files into the realm of confusing as hell. Most of your students won't bother to look deep into the library (and so won't notice the level of indirection that they need to serialise/deserialise). I should hope you've accommodated for this in your lesson plans, but then again, I stopped auditing your lessons because they're full of platform-specific nonsense like the sizeof an int (which you've previously acknowledged but refused to fix, because you don't want to make your programming lessons about a specific platform... lol).

@ztane

This comment has been minimized.

Copy link

@ztane ztane commented Aug 9, 2019

@Sebbyastian krhm... the bible of C language before standardization was "The C programming language" by K&R, an introductory book, where R was the founding forefather of the language; this predated the invention of C++, let alone any introductory books. Unfortunately R passed away 8 years ago so he's not here to defend his language against your attack.

@JoelKatz

This comment has been minimized.

Copy link

@JoelKatz JoelKatz commented Aug 19, 2019

I have seen this particular typedef confuse beginners more than once. The important thing to know is that this is a pointer and calling it a "string" confuses people. They expect its value to be the value of a string because it's called "string" and it's not. The only thing this does is hide that the type is a pointer and in order to use it, you must know that it's a pointer.

It does not help in any way. It doesn't save space or typing, "string" is actually longer than "char*". And in order to use it or tell if it's being used correctly, you have to know it's a pointer. So it doesn't hide any information that isn't always needed. Every C programmer must know what "char*" is or they can't understand this "typedef", so it doesn't avoid having to know or understand anything.

It literally serves no useful purpose whatsoever and does affirmative harm. What is the argument for keeping it?

@FreeER

This comment has been minimized.

Copy link

@FreeER FreeER commented Aug 19, 2019

Ya'll act as if this isn't revealed by the end of the course, it's shown by week 2 with caesar cipher and others. What it does is allow you to get user input in week 1 without having to explain why there are multiplication symbols in the code when you're dealing with text not math or why you have to use pointers and what the difference from an array etc are. All of that does of course get covered but it doesn't need to be covered immediately and this lets them hide it for just the barest amount of introduction period. Nor is it particularly expected to be a library a bunch of people use and start typing string instead of char*. Remember, the most efficient way to program is not necessarily the most efficient way to teach/learn. The fact is they've had this for many years and it's working for them.
disclaimer: I'm just a random person who took CS50 a few years back on edx and enjoyed it once I got over trying to do everything on my own with just the lectures.

@JoelKatz

This comment has been minimized.

Copy link

@JoelKatz JoelKatz commented Aug 19, 2019

@FreeER It's because we see the harm it does over and over. It may well be working for them, but it's producing students who don't understand why the value of something they called a "string" isn't a string. I haven't taken the CS50 course, but the evidence I've seen suggests that students do not come out of the course understanding that this is bad practice. At a minimum, there needs to be a big fat warnings because students can't be expected to figure this out on their own and experience says that they don't.

This particular pedagogical technique is doing harm to students.

@FreeER

This comment has been minimized.

Copy link

@FreeER FreeER commented Aug 19, 2019

So you've seen the harm it's done to CS50 students when it's explained in a week or students who picked it up from elsewhere and may never have had it explained?
Keep in mind that it's just an introductory course for anyone, any CS inclined student should be taking several others afterwards.

@JoelKatz

This comment has been minimized.

Copy link

@JoelKatz JoelKatz commented Aug 21, 2019

@FreeER People are only beginners for a very short time and you would be surprised how much of an impression the way they first learned things makes. It is perfectly fine to make things simpler for beginners, but making them confusing is a huge mistake. The main thing you need to know about this thing in order to use it is that it has the semantics of a pointer. You shouldn't teach beginners things they need to unlearn.

@FreeER

This comment has been minimized.

Copy link

@FreeER FreeER commented Aug 21, 2019

Confusing them is inevitable. They aren't computers who are intuitively going to understand computers and the languages and tools invented to work with them even with explanations, they are going to get confused about things. At that point it's a question of do you ever simplify things, because if you do then the very fact that you are simplifying something means you aren't teaching everything and they are going to have to unlearn the simplified teaching at some point to learn the more complete and correct teaching. You can argue that some things can be simplified without being wrong and you can just add to it, but the fact is that they have to learn that the simplified version isn't the full story and have to relearn it to keep the extra stuff in mind so I don't see it as being significantly different. And at that point it's just a matter of teaching style, perhaps worth teachers arguing over which is more effective, throwing everything at students or simplifying, but not a matter of the code itself and so doesn't belong here.

@JoelKatz

This comment has been minimized.

Copy link

@JoelKatz JoelKatz commented Aug 21, 2019

@FreeER "not a matter of the code itself and so doesn't belong here"

I don't agree. I can understand the logic that it's the fault of those who teach badly that this confusion keeps occurring again and again. But that is just wrong. It is the code's fault. Code that leads people down a path that is known by experts to cause problems without adequate (or any) warning is just bad code. And once you see the harm it does, the sensible thing is to fix it.

Sure, you could educate every teacher so that they educate every student. Or, you could change the code once so that it doesn't set a trap for the unwary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
6 participants
You can’t perform that action at this time.