Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning HTML Chuking Answer at the Custom Reader class for getting filtered JsonDocument #2105

Closed
ENG-WiPhilia opened this issue Jun 24, 2024 · 1 comment
Labels
v7 ArduinoJson 7

Comments

@ENG-WiPhilia
Copy link

In my project, I needed filtering a long answer comming from HTTP TLSv1.2 encrypted server connection. HTTP 1.0 protocol chunk the text answer when the data is long. I developed the TLSCustomReader class for cleaning the HTTP headers and the chunked characters for leaving a continuous stream of acceptable chars to the JSON Document to allow filtering correctly the incomming data. How is not a WEB Server, but a very specific application server, I have to deal with the HTTP 1.0 protocol.

I am very impressed with you work in doing the ArduinoJSON library, really, I think this is the most organized and documented library I have seen doing my project. I do not know if You have worked yet with solving HTTP chunked data comming into the CustomReader Class. If you need some idea of how doing that, i bring you my solution as a contribution to your project or to your library users.

The Solution in C++ implemented in arduino IDE:
// the JSON Document with the filter options to be applied to the incomming stream
JsonDocument hueFilter; // arduinoJSONv7
// the JSON Document with the incomming stream data
JsonDocument doc; // ArduinoJSONv7
// ******************************************************* //
// initialize finalPack
// ******************************************************* //
txData = 0;
//Create the JSON CustomReader for Steam read() function using the TLS connection
TLSCustomReader CustomReader(connectTLS, txData);
// Deserialize the JSON directly from the custom reader
DeserializationError error;
error = deserializeJson(doc, CustomReader, DeserializationOption::Filter(hueFilter));

Note that connectTLS was created before [connectTLS = createTLSConnection(connectNum); ] ;however, i think this is not required for the implementation of the customer reader objet of this contribution, but you can demand for it if you required
the connectTLS is used to send the data [ret = esp_tls_conn_write(connectTLS, httpRequest + written_bytes, strlen(httpRequest) - written_bytes)] and to receive the data in the custom reader class:

struct TLSCustomReader { // ArduinoJSON Custom Reader Class
// convert Steam read() to readable JSON Document data in eliminating
// http headers and chunk information from Http1.1 at the refill buffer
// **********************************************************************
// this class is used to fill the ArduinoJSON Document
// ArduinoJSON Document to replace String Manipulation of version 7
// More memory is required comparing with plain String if no filtering
// is applied, filtering required to reduce memory usage then
// this struct is required as Custom Reader for put filtered
// Stream serial information into the ArduinoJSON Document
// *********************************************************************
esp_tls_t* connect;
static const size_t bufferSize = 1024;
const char* chunkInd = "\r\n";
char buffer[bufferSize];
char jsonBuf[bufferSize];
char hexBuffer[16];
size_t hexlength;
size_t bytesRead, bytesReady, charsRead;
size_t pos, posTemp;
bool chunked, cutChunk, headers, finalPack;
unsigned long remainData;
bool lookHTMLerr;

explicit TLSCustomReader(esp_tls_t* TLSConnect, size_t &totalBytes)
    : connect(TLSConnect), bytesRead(0), pos(0), headers(true)  {  }

int read() {
    if (pos < bytesRead) {
        return jsonBuf[pos++];
        } 
    else {
        refillBuffer();
        return (pos < bytesRead) ? jsonBuf[pos++] : -1;
        }
    }

size_t readBytes(char* buffer, size_t length) {
    size_t bytesRead = 0;
    while (length > 0) {
        if (pos < this->bytesRead) {
            jsonBuf[bytesRead++] = this->jsonBuf[pos++];
            --length;
            } 
        else {
            refillBuffer();
            if (pos >= bytesRead) {
                // No more data to read
                break;
                }
            }
        }
    return bytesRead;
    }

bool available() const {
    return pos < bytesRead;
    }  

private:

void verifyHTMLerror (void) {
   /**********************************************************************************************
   *         Procedure for looking for HTML DOCTYPE Errors into the current jsonBuf              *
   ***********************************************************************************************/
   // the html error quotation <div class="error"> error_description <div> is present at the HTML
   //  response when it is not possible build the JSON content for the command received
   //    Note: is assumed that the entire html response is into the first jsonBuf received

   String readDoc = String(jsonBuf);
   
   int errStr = readDoc.indexOf("<body>");// HTML DOCTYPE: <html><head>...</head><body>...</body></html>
   changeHttpErr = 0;
   if (errStr>0) { // is an HTML text (not a JSON document -> Invalid Input -> jsonDocument == null)
      switch(httpResp) { // any httpResp 
         // httpResp<>200, the content response is an HTML page with the error description within
         // case 400: {} break; // if required, must be studied
         // case 401: {} break; // if required, must be studied
         // case 402: {} break; // if required, must be studied
         case 403: { // http error 403: forbiden
            int errStr = readDoc.indexOf("<div class=\"error\">");
            if (errStr>0) { // message error into html response page
               String errDesc = readDoc.substring(errStr + 19, readDoc.length());
               errStr=0;
               int errEnd = errDesc.indexOf("</div>");
               errDesc = errDesc.substring(0,errEnd);
               dbgPrint_F("HTML ["); dbgPrint(httpResp);
               dbgPrint_F("] error description: ");dbgPrintln(errDesc);
               errStr = errDesc.indexOf("no lighting here");
               if (errStr>0) { 
                  changeHttpErr = 173; 
                  //dbgPrint_F("Html-Error-Change: ");dbgPrintln(changeHttpErr);
                  break; } 
               } // message error into html response page
            else { // html  response does not have "<div class=\"error\">"
               dbgPrint_F("HTML ["); dbgPrint(httpResp);
               dbgPrint_F("] response into no-JSON document!");
               changeHttpErr = 174; // HUE error response!
               } // html  response does not have "<div class=\"error\">"
            } break;// http error 403: forbiden
         // case 404: {} break; // if required, must be studied
         // ...
         // case 507: {} break; // if required, must be studied
         default :  { // for instance, all html errors are traited as the 403
            int errStr = readDoc.indexOf("<div class=\"error\">");
            if (errStr>0) { // message error into html response page
               String errDesc = readDoc.substring(errStr + 19, readDoc.length());
               errStr=0;
               int errEnd = errDesc.indexOf("</div>");
               errDesc = errDesc.substring(0,errEnd);
               dbgPrint_F("HTML ["); dbgPrint(httpResp);
               dbgPrint_F("] error description: ");dbgPrintln(errDesc);
               errStr = errDesc.indexOf("no lighting here");
               if (errStr>0) { 
                  changeHttpErr = 173; 
                  //dbgPrint_F("Html-Error-Change: ");dbgPrintln(changeHttpErr);
                  break; } 
               } // message error into html response page
            else { // html  response does not have "<div class=\"error\">"
               dbgPrint_F("HTML ["); dbgPrint(httpResp);
               dbgPrint_F("] response into no-JSON document!");   
               changeHttpErr = 174; 
               } // html  response does not have "<div class=\"error\">"
            } // anyOther httpResp, for instance, all html errors are traited as the 403
         } // any httpResp
      } // is an HTML text (not a JSON document -> Invalid Input -> jsonDocument == null)
   } // verifyHTMLerror(void)

void refillBuffer() {

   // fill buffer with data from TLS Connection (repeat until buffer Complete)
   if (headers) { // first packet to buffer
       charsRead = esp_tls_conn_read(connect, buffer, bufferSize-1);
       buffer[charsRead] = '\0';
       skipHttpHeaders(); // eliminate headers, detect HTTP type and chunked transmittion
       if (chunked) processChunked(); // eliminate chunk indicators and get last chunked size (remaindData)  
       }
   memcpy(jsonBuf, buffer, bufferSize); 
   bytesReady = charsRead;      
   memset(buffer, '\0', charsRead);
   //at this point -> buffer = empty // jsonBuf = full
   if (!finalPack) { // Still data to get
      charsRead = esp_tls_conn_read(connect, buffer, bufferSize-1);
      if (chunked) processChunked(); // eliminate chunk indicators and get last chunked size (remainData)
      else finalPack = (remainData -= charsRead > 0) ? false : true;
      } 
   if ((charsRead==0)&&(!finalPack)) { // end of data but look for final pack
      while (!finalPack) {
         charsRead = esp_tls_conn_read(connect, buffer, bufferSize-1);
         if (chunked) processChunked(); // eliminate chunk indicators and get last chunked size (remaindData)
         }
       }
   //at this point -> buffer = full // jsonBuf = full 
   dbgPrintln_F("JSON Buffer : |");dbgPrint(jsonBuf);dbgPrintln("|");  

   /************************************************************************************
   *______________aditionnal to process HTML Page when not JSON doc!!!_________________*
   *                                                                                   *
   ************************************************************************************/
   //** Note: the html error code must be received into the first jsonBuf if chunked **/
   //---------------------------------------------------------------------------------//
   
   if (lookHTMLerr) { // verify content of http error only into first jsonBuf
      verifyHTMLerror();
      lookHTMLerr = false;
      } // verify content of http error only into first jsonBuf

   /***********************************************************************************
   *__________________________________________________________________________________*
   *                                                                                  *
   ***********************************************************************************/

   txData += bytesReady; 
   bytesRead = bytesReady;
   pos = 0;
   }

void processChunked(void) { // cut the chunk information '\r\n hexChunk \r\n' to buffer

   char* found;
   // note: the chunked sequence "\r\n hexCount \r\n"  have to 
   // be extrait from the data flow to the JSON Document (buffer), 
   // but the chunked sequence could be cut from one buffer bloc
   // to other, the presence of "\n" or "\r" chars must be check,
   // if present, complete the last incompleted chunk information
   // in the first bytes of next buffer
   if (cutChunk) {
      size_t xlength;
      //look for last part of chunkInd charged in new buffer
      found = strstr(buffer, chunkInd);
      xlength = (found-buffer+2); 
      byte n = 1;
      while ((hexlength*n - xlength) == 2) { // get '/r/n''/r/n''until data
         found = strstr(buffer + xlength, chunkInd);
         xlength = (found-buffer+2);
         n++;
         }     
      memcpy(hexBuffer + hexlength, buffer, xlength);  
      // now hexBuffer = 'n/r hexCount /n/r
      /*********************************************************
      *                getChunkedSize                          *
      **********************************************************/
      hexlength += xlength;
      hexBuffer[hexlength] = '\0';
      // Convert the variable hex sequence to a numeric value
      remainData = strtoul(hexBuffer+2, nullptr, 16);
      // Now, you have the numeric value of the variable hex sequence
      // position of start for transfer buffer for JSON Documen
      cutChunk = false;
      charsRead-= xlength;
      // move all the buffer to origin, cut the http headers
      memmove(buffer, buffer + xlength, charsRead);
      // put '/0' NULL at the end deplaced bytes (charsRead, bufferSize)
      memset(buffer + charsRead, '\0', bufferSize - (charsRead));
      if ((remainData == 0) && (hexlength>4)) finalPack = true;
      }
   found = strnstr(buffer, chunkInd, charsRead);
   char* start;
   char* end;
   byte iter = 1;  
   while (found != nullptr) { // chunk Start Sequence into the buffer
      // Calculate the start position of the variable hex sequence
      // Find the end Sequence of the variable hex sequence
      byte n = 1;
      end = strstr(found + 2*n, chunkInd);
      if (end != nullptr) {
         // Calculate the length of the variable hex sequence
         hexlength = end - found + 2*n;
         while ((end - found) == 2*n) { // get '/r/n''/r/n''until data
             n++;
             end = strstr(found+ 2*n, chunkInd);
             if (end==nullptr) break; // Double Chunk Sequence Detected
             hexlength = end - found + 2*n;
             }
          }
      if (end == nullptr) { // End sequence not found, break the loop
         // '/r/n' or many times '/r/n' without hex numbers only at the end of chunk
         cutChunk = true; // the chunkInd have only /n/r..cut...
         hexlength = buffer + charsRead - found;
         strncpy(hexBuffer, found, hexlength);
         charsRead -= hexlength;
         memset(buffer+charsRead, '\0', hexlength); // clear the Buffer last hexlength places        
         break; 
         }  
      // Extract the variable hex sequence
      strncpy(hexBuffer, found, hexlength);
      /*********************************************************
      *                 getChunkedSize                         *
      **********************************************************/
      hexBuffer[hexlength+1] = '\0';
      // Convert the variable hex sequence to a numeric value
      remainData = strtoul(hexBuffer+2*n, nullptr, 16);
      // Now, you have the numeric value of the variable hex sequence
      memset(hexBuffer, '\0', hexlength); // clear the hexBuffer
     
      if ((remainData == 0) && (hexlength>0)) finalPack = true;
      // Shift the remaining characters to remove the sequence
      size_t endsection = bufferSize - (found - buffer + hexlength);
      memmove(found, found + hexlength, endsection);
      charsRead -= hexlength;
      // put '/0' NULL at the end deplaced bytes (charsRead+1, bufferSize)
      memset(buffer + charsRead, '\0', bufferSize - (charsRead));
      //Next Chunk Indicator
      // (charsRead>sizeof(chunkInd))
      found = strnstr(buffer, chunkInd, charsRead);   
      iter++;
      } // chunk Start Sequence into the buffer
   // found char "/n"
   //verifiy if chunk init char '\n' is into charsRead section of buffer
   if (buffer[charsRead-1] =='\n') {
      hexBuffer[0] = '\n'; hexBuffer[1] = '\r'; hexlength = 2;       
      cutChunk = true;
      buffer[charsRead-1] = '\0'; charsRead--; 
      }
   //verifiy if some of chunk char '\r' is into charsRead section of buffer   
   if (buffer[charsRead-1] =='\r') {
      hexBuffer[0] = '\n'; hexBuffer[1] = '\r'; hexlength = 2;       
      cutChunk = true;
      buffer[charsRead-1] = '\0'; charsRead--; 
      }
   } // cut the chunk information to buffer

void skipHttpHeaders(void) {
   size_t position = 0;
   lookHTMLerr = false;
   // Assuming the complete HTTP header is into the buffer
   while (true) { 
      httpHeadln = ""; //clear line
      while (true) { // fill httpHeadln with chars until '\r'
         int c = buffer[position++];
         if (c == -1 || c == '\r') {
             // End of line or end of stream
             break;
             }
         httpHeadln += static_cast<char>(c);
         } // fill httpHeadln with chars until '\r'
      position++; // c=tempbuf[position++] Consume '\n'  
      // dbgPrintln(httpHeadln);
      if (httpHeadln.startsWith("HTTP/1.1")) {
         // Check the HTTP Response
         httpResp = httpHeadln.substring(9,12).toInt(); // 200 -> OK or other
         if (httpResp!= 200 ) {
            dbgPrint_F("Error in HTTP response : [");
            dbgPrint(httpResp);dbgPrintln("]");
            lookHTMLerr = true;
            //break; -> not break, continue to receive all the response:
            //error description could be into the buffer type JSON content or
            //there is an html page with the error message (not JSON Content)
            //the error procedure will do verifyHTMLerror for not JSON Content
            }
         }
      if (httpHeadln.isEmpty()) break; // Empty line indicates end of headers
      if (httpHeadln.startsWith("Transfer-Encoding: chunked")) chunked = true;
      if (httpHeadln.startsWith("Content-Length:")) {
           String lineLength = httpHeadln.substring(16); // 200 -> OK or other
           remainData = lineLength.toInt();
           chunked = false;
           }    
       }
   // move all the buffer's data to the origin, cut the http headers 
   if (!chunked) {
      charsRead -= position; 
      finalPack = (remainData -= charsRead <= 0) ? true : false;
      } 
   else {   
     position -= 2; // include '/r/n/' for chunk indicator (if required)
     charsRead -= position;
     finalPack = false;
     cutChunk = false;
     }
   memmove(buffer, buffer + position, charsRead);     
   headers = false; 
   // put '/0' NULL at the end deplaced bytes (charsRead + 1, bufferSize)
   memset(buffer + charsRead , '\0', bufferSize - (charsRead));
   } 
}; // ArduinoJSON Custom Reader Class

I implememted this solution and after many months and hundreds of streams received using it, i have not errors in the received data. I know is a lot of information, but you can feel free to contact me if you need details or clarify any of the precedent content.

Juan Carlos Gomez Casal
jcgcasal@gmail.com

Note: I can write/speak in french if you prefer.

@bblanchon bblanchon added v7 ArduinoJson 7 and removed enhancement labels Jun 25, 2024
@bblanchon
Copy link
Owner

Hi Juan Carlos,

Thank you very much for sharing this.

I'm assuming you're referring to HTTP's Chunked Transfer Encoding.
This encoding was added in version 1.1 of the protocol, so your server must not use it if the client uses HTTP/1.0.

I promised I would add an adapter to handle chunked transfer encoding in my StreamUtils library, but I never got the time to do it. I'll try to allocate some time after the release of 7.1.

Best regards,
Benoit

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v7 ArduinoJson 7
Projects
None yet
Development

No branches or pull requests

2 participants